arche.readers.items¶
Module Contents¶
-
arche.readers.items.RawItems¶
-
class
arche.readers.items.Items(raw: RawItems, df: pd.DataFrame)¶ -
__len__(self)¶
-
static
process_df(df: pd.DataFrame)¶
-
static
categorize(df: pd.DataFrame)¶ Cast columns with repeating values to category type to save memory
-
classmethod
from_df(cls, df: pd.DataFrame)¶
-
classmethod
from_array(cls, iterable: RawItems)¶
-
-
class
arche.readers.items.CloudItems(key: str, count: Optional[int] = None, filters: Optional[api.Filters] = None)¶ Bases:
arche.readers.items.Items-
property
limit(self)¶ The maximum number of items in source
-
property
count(self)¶ The number of items users wants to retrieve
-
abstract
fetch_data(self)¶
-
abstract
format_keys(self, keys: pd.Series)¶
-
property
-
class
arche.readers.items.JobItems(key: str, count: Optional[int] = None, start_index: int = 0, filters: Optional[api.Filters] = None)¶ Bases:
arche.readers.items.CloudItems-
property
limit(self)¶
-
property
count(self)¶
-
property
job(self)¶
-
fetch_data(self)¶
-
format_keys(self, keys: pd.Series)¶ Get Scrapy Cloud url to an item E.g. 112358/13/21/0 to https://app.scrapinghub.com/p/112358/13/21/item/0
-
property
-
class
arche.readers.items.CollectionItems(key: str, count: Optional[int] = None, start: Optional[str] = None, filters: Optional[api.Filters] = None)¶ Bases:
arche.readers.items.CloudItems-
property
limit(self)¶
-
property
count(self)¶
-
fetch_data(self)¶
-
format_keys(self, keys: pd.Series)¶ Get full Scrapy Cloud url from _key E.g. be-006 to https://app.scrapinghub.com/p/collections/s/pages/be-006
-
property