arche.readers.items

Module Contents

arche.readers.items.RawItems
class arche.readers.items.Items(raw: RawItems, df: pd.DataFrame)
__len__(self)
static process_df(df: pd.DataFrame)
static categorize(df: pd.DataFrame)

Cast columns with repeating values to category type to save memory

classmethod from_df(cls, df: pd.DataFrame)
classmethod from_array(cls, iterable: RawItems)
class arche.readers.items.CloudItems(key: str, count: Optional[int] = None, filters: Optional[api.Filters] = None)

Bases: arche.readers.items.Items

property limit(self)

The maximum number of items in source

property count(self)

The number of items users wants to retrieve

abstract fetch_data(self)
abstract format_keys(self, keys: pd.Series)
class arche.readers.items.JobItems(key: str, count: Optional[int] = None, start_index: int = 0, filters: Optional[api.Filters] = None)

Bases: arche.readers.items.CloudItems

property limit(self)
property count(self)
property job(self)
fetch_data(self)
format_keys(self, keys: pd.Series)

Get Scrapy Cloud url to an item E.g. 112358/13/21/0 to https://app.scrapinghub.com/p/112358/13/21/item/0

class arche.readers.items.CollectionItems(key: str, count: Optional[int] = None, start: Optional[str] = None, filters: Optional[api.Filters] = None)

Bases: arche.readers.items.CloudItems

property limit(self)
property count(self)
fetch_data(self)
format_keys(self, keys: pd.Series)

Get full Scrapy Cloud url from _key E.g. be-006 to https://app.scrapinghub.com/p/collections/s/pages/be-006