arche.data_quality_report
¶
Module Contents¶
-
class
arche.data_quality_report.
DataQualityReport
(items: JobItems, schema: Schema, report: Report, bucket: Optional[str] = None)¶ -
create_figures
(self, items: JobItems)¶
-
plot_to_notebook
(self)¶
-
plot_html_to_stream
(self)¶
-
create_appendix
(self, schema)¶
-
save_report_to_bucket
(self, project_id, spider, bucket)¶
-
score_table
(self, quality_estimation, field_accuracy)¶
-
job_summary_table
(self, job)¶
-
rules_summary_table
(self, df, no_of_validation_warnings, name_field, url_field, no_of_checked_duplicated_items, no_of_duplicated_items, price_field, price_was_field, no_of_checked_price_items, no_of_price_warns, **kwargs)¶
-
scraped_fields_coverage
(self, df: pd.DataFrame)¶
-
coverage_by_categories
(self, df, tags)¶ Make tables which show the number of items per category, set up with a category tag
- Parameters
df – a dataframe of items
tags – a dict of tags
-