:mod:`arche.readers.items` ========================== .. py:module:: arche.readers.items Module Contents --------------- .. data:: RawItems .. py:class:: Items(raw: RawItems, df: pd.DataFrame) .. method:: __len__(self) .. method:: process_df(df: pd.DataFrame) :staticmethod: .. method:: categorize(df: pd.DataFrame) :staticmethod: Cast columns with repeating values to `category` type to save memory .. method:: from_df(cls, df: pd.DataFrame) :classmethod: .. method:: from_array(cls, iterable: RawItems) :classmethod: .. py:class:: CloudItems(key: str, count: Optional[int] = None, filters: Optional[api.Filters] = None) Bases: :class:`arche.readers.items.Items` .. method:: limit(self) :property: The maximum number of items in source .. method:: count(self) :property: The number of items users wants to retrieve .. method:: fetch_data(self) :abstractmethod: .. method:: format_keys(self, keys: pd.Series) :abstractmethod: .. py:class:: JobItems(key: str, count: Optional[int] = None, start_index: int = 0, filters: Optional[api.Filters] = None) Bases: :class:`arche.readers.items.CloudItems` .. method:: limit(self) :property: .. method:: count(self) :property: .. method:: job(self) :property: .. method:: fetch_data(self) .. method:: format_keys(self, keys: pd.Series) Get Scrapy Cloud url to an item E.g. 112358/13/21/0 to https://app.scrapinghub.com/p/112358/13/21/item/0 .. py:class:: CollectionItems(key: str, count: Optional[int] = None, start: Optional[str] = None, filters: Optional[api.Filters] = None) Bases: :class:`arche.readers.items.CloudItems` .. method:: limit(self) :property: .. method:: count(self) :property: .. method:: fetch_data(self) .. method:: format_keys(self, keys: pd.Series) Get full Scrapy Cloud url from `_key` E.g. be-006 to https://app.scrapinghub.com/p/collections/s/pages/be-006