arche.rules.duplicates

Module Contents

arche.rules.duplicates.find_by(df: pd.DataFrame, uniques: List[Union[str, List[str]]]) → Result

Find equal items rows in df by uniques. I.e. if two items have the same uniques’s element value, they are considered duplicates.

Parameters
  • uniques – list containing columns and list of columns to identify duplicates.

  • of columns means that all list columns values should be equal. (List) –

Returns

Any duplicates

arche.rules.duplicates.find_by_tags(df: pd.DataFrame, tagged_fields: TaggedFields) → Result

Check for duplicates based on schema tags. In particular, look for items with the same name_field and product_url_field, and for uniqueness among unique field

arche.rules.duplicates.flatten(l: Any) → Generator[str, None, None]