In Short

[1]:
import arche
from arche import *
[2]:
a = Arche("381798/1/4", schema="https://raw.githubusercontent.com/scrapinghub/arche/master/docs/source/nbs/data/books.json", target="381798/1/3")
[3]:
a.source_items.df.head()


[3]:
title price category description
https://app.scrapinghub.com/p/381798/1/4/item/0 The Black Maria £52.15 Poetry Praise for Aracelis Girmay:"[Girmay's] every l...
https://app.scrapinghub.com/p/381798/1/4/item/1 The Boys in the Boat: Nine Americans and Their... £22.60 Default For readers of Laura Hillenbrand's Seabiscuit ...
https://app.scrapinghub.com/p/381798/1/4/item/2 The Coming Woman: A Novel Based on the Life of... £17.93 Default "If you have a heart, if you have a soul, Kare...
https://app.scrapinghub.com/p/381798/1/4/item/3 The Dirty Little Secrets of Getting Your Dream... £33.34 Business Drawing on his extensive experience evaluating...
https://app.scrapinghub.com/p/381798/1/4/item/4 The Requiem Red £22.65 Young Adult Patient Twenty-nine.A monster roams the halls ...
[4]:
a.report_all()

Some rules are not included in the above (more in rules), so a common case would be:

[5]:
arche.rules.duplicates.find_by(a.source_items.df, ["title", "price"]).show()
[ ]: