arche.data_quality_report

Module Contents

class arche.data_quality_report.DataQualityReport(items: JobItems, schema: Schema, report: Report, bucket: Optional[str] = None)
create_figures(self, items: JobItems)
plot_to_notebook(self)
plot_html_to_stream(self)
create_appendix(self, schema)
save_report_to_bucket(self, project_id, spider, bucket)
score_table(self, quality_estimation, field_accuracy)
job_summary_table(self, job)
rules_summary_table(self, df, no_of_validation_warnings, name_field, url_field, no_of_checked_duplicated_items, no_of_duplicated_items, price_field, price_was_field, no_of_checked_price_items, no_of_price_warns, **kwargs)
scraped_fields_coverage(self, df: pd.DataFrame)
coverage_by_categories(self, df, tags)

Make tables which show the number of items per category, set up with a category tag

Parameters
  • df – a dataframe of items

  • tags – a dict of tags