arche¶
Subpackages¶
Submodules¶
Package Contents¶
-
class
arche.Arche(source: Union[str, pd.DataFrame, RawItems], schema: Optional[SchemaSource] = None, target: Optional[Union[str, pd.DataFrame]] = None, count: Optional[int] = None, start: Union[str, int] = None, filters: Optional[api.Filters] = None, expand: bool = None)¶ -
property
source_items(self)¶
-
property
target_items(self)¶
-
property
schema(self)¶
-
static
get_items(source: Union[str, pd.DataFrame, RawItems], count: Optional[int], start: Optional[str], filters: Optional[api.Filters])¶
-
save_result(self, rule_result)¶
-
report_all(self, short: bool = False, uniques: List[Union[str, List[str]]] = None)¶ Report on all included rules.
- Parameters
uniques – see arche.rules.duplicates.find_by
-
run_all_rules(self)¶
-
data_quality_report(self, bucket: Optional[str] = None)¶
-
run_general_rules(self)¶
-
validate_with_json_schema(self)¶ Run JSON schema check and output results. It will try to find all errors, but there are no guarantees. Slower than check_with_json_schema()
-
glance(self)¶ Run JSON schema check and output results. In most cases it will return only the first error per item. Usable for big jobs as it’s about 100x faster than validate_with_json_schema().
-
run_schema_rules(self)¶
-
run_customized_rules(self, items, tagged_fields)¶
-
check_metadata(self, job)¶
-
compare_metadata(self, source_job, target_job)¶
-
run_comparison_rules(self)¶
-
compare_with_customized_rules(self, source_items, target_items, tagged_fields)¶
-
property
-
arche.basic_json_schema(data_source: str, items_numbers: List[int] = None) → Schema¶ Print a json schema based on the provided job_key and item numbers
- Parameters
data_source – a collection or job key
items_numbers – array of item numbers to create schema from