arche — Arche 0.3.6 documentation

Subpackages¶

Submodules¶

Package Contents¶

class arche.Arche(source: Union[str, pd.DataFrame, RawItems], schema: Optional[SchemaSource] = None, target: Optional[Union[str, pd.DataFrame]] = None, count: Optional[int] = None, start: Union[str, int] = None, filters: Optional[api.Filters] = None, expand: bool = None)¶

property source_items(self)¶

property target_items(self)¶

property schema(self)¶

static get_items(source: Union[str, pd.DataFrame, RawItems], count: Optional[int], start: Optional[str], filters: Optional[api.Filters])¶

save_result(self, rule_result)¶

report_all(self, short: bool = False, uniques: List[Union[str, List[str]]] = None)¶

Report on all included rules.

Parameters: uniques – see arche.rules.duplicates.find_by

run_all_rules(self)¶

data_quality_report(self, bucket: Optional[str] = None)¶

run_general_rules(self)¶

validate_with_json_schema(self)¶: Run JSON schema check and output results. It will try to find all errors, but there are no guarantees. Slower than check_with_json_schema()

glance(self)¶: Run JSON schema check and output results. In most cases it will return only the first error per item. Usable for big jobs as it’s about 100x faster than validate_with_json_schema().

run_schema_rules(self)¶

run_customized_rules(self, items, tagged_fields)¶

check_metadata(self, job)¶

compare_metadata(self, source_job, target_job)¶

run_comparison_rules(self)¶

compare_with_customized_rules(self, source_items, target_items, tagged_fields)¶

arche.basic_json_schema(data_source: str, items_numbers: List[int] = None) → Schema¶

Print a json schema based on the provided job_key and item numbers

Parameters

data_source – a collection or job key
items_numbers – array of item numbers to create schema from

arche¶

Subpackages¶

Submodules¶

Package Contents¶

`arche`¶