arche.readers.items
¶
Module Contents¶
-
arche.readers.items.
RawItems
¶
-
class
arche.readers.items.
Items
(raw: RawItems, df: pd.DataFrame)¶ -
__len__
(self)¶
-
static
process_df
(df: pd.DataFrame)¶
-
static
categorize
(df: pd.DataFrame)¶ Cast columns with repeating values to category type to save memory
-
classmethod
from_df
(cls, df: pd.DataFrame)¶
-
classmethod
from_array
(cls, iterable: RawItems)¶
-
-
class
arche.readers.items.
CloudItems
(key: str, count: Optional[int] = None, filters: Optional[api.Filters] = None)¶ Bases:
arche.readers.items.Items
-
property
limit
(self)¶ The maximum number of items in source
-
property
count
(self)¶ The number of items users wants to retrieve
-
abstract
fetch_data
(self)¶
-
abstract
format_keys
(self, keys: pd.Series)¶
-
property
-
class
arche.readers.items.
JobItems
(key: str, count: Optional[int] = None, start_index: int = 0, filters: Optional[api.Filters] = None)¶ Bases:
arche.readers.items.CloudItems
-
property
limit
(self)¶
-
property
count
(self)¶
-
property
job
(self)¶
-
fetch_data
(self)¶
-
format_keys
(self, keys: pd.Series)¶ Get Scrapy Cloud url to an item E.g. 112358/13/21/0 to https://app.scrapinghub.com/p/112358/13/21/item/0
-
property
-
class
arche.readers.items.
CollectionItems
(key: str, count: Optional[int] = None, start: Optional[str] = None, filters: Optional[api.Filters] = None)¶ Bases:
arche.readers.items.CloudItems
-
property
limit
(self)¶
-
property
count
(self)¶
-
fetch_data
(self)¶
-
format_keys
(self, keys: pd.Series)¶ Get full Scrapy Cloud url from _key E.g. be-006 to https://app.scrapinghub.com/p/collections/s/pages/be-006
-
property