arche

Package Contents

class arche.Arche(source: Union[str, pd.DataFrame, RawItems], schema: Optional[SchemaSource] = None, target: Optional[Union[str, pd.DataFrame]] = None, count: Optional[int] = None, start: Union[str, int] = None, filters: Optional[api.Filters] = None, expand: bool = None)
property source_items(self)
property target_items(self)
property schema(self)
static get_items(source: Union[str, pd.DataFrame, RawItems], count: Optional[int], start: Optional[str], filters: Optional[api.Filters])
save_result(self, rule_result)
report_all(self, short: bool = False, uniques: List[Union[str, List[str]]] = None)

Report on all included rules.

Parameters

uniques – see arche.rules.duplicates.find_by

run_all_rules(self)
data_quality_report(self, bucket: Optional[str] = None)
run_general_rules(self)
validate_with_json_schema(self)

Run JSON schema check and output results. It will try to find all errors, but there are no guarantees. Slower than check_with_json_schema()

glance(self)

Run JSON schema check and output results. In most cases it will return only the first error per item. Usable for big jobs as it’s about 100x faster than validate_with_json_schema().

run_schema_rules(self)
run_customized_rules(self, items, tagged_fields)
check_metadata(self, job)
compare_metadata(self, source_job, target_job)
run_comparison_rules(self)
compare_with_customized_rules(self, source_items, target_items, tagged_fields)
arche.basic_json_schema(data_source: str, items_numbers: List[int] = None) → Schema

Print a json schema based on the provided job_key and item numbers

Parameters
  • data_source – a collection or job key

  • items_numbers – array of item numbers to create schema from