`arche.tools.api`¶

Module Contents¶

arche.tools.api.Filters¶

arche.tools.api.get_job(key: str) → Job¶

arche.tools.api.get_jobs(keys: List[str]) → List[Job]¶

arche.tools.api.get_collection(key)¶

arche.tools.api.get_errors_count(job)¶

arche.tools.api.get_job_state(job)¶

arche.tools.api.get_job_close_reason(job)¶

arche.tools.api.get_items_count(job)¶

arche.tools.api.get_counts(job: Job) → Optional[Dict[str, int]]¶

arche.tools.api.get_finish_time_difference_in_days(job1, job2)¶

arche.tools.api.get_runtime(job)¶: Returns the runtime in milliseconds or None if job is still running

arche.tools.api.get_runtime_s(job)¶: Returns job runtime in milliseconds.

arche.tools.api.get_max_memusage(job)¶

arche.tools.api.get_response_status_count(job)¶

arche.tools.api.get_requests_count(job)¶

arche.tools.api.get_crawlera_user(job)¶

arche.tools.api.get_source(source_key)¶

arche.tools.api.get_items_with_pool(source_key: str, count: int, start_index: int, workers: int = 4) → np.ndarray¶

Concurrently reads items from API using Pool

Parameters

source_key – a job or collection key, e.g. ‘112358/13/21’
count – a number of items to retrieve
start_index – an index to read from
workers – the number of separate processors to get data in

Returns

A numpy array of items

arche.tools.api.get_items(key: str, count: int, start_index: int, start: Optional[str], filters: Optional[Filters] = None, p_bar: Union[tqdm, notebook.tqdm] = notebook.tqdm, desc: Optional[str] = None) → np.ndarray¶