In Short¶
[1]:
import arche
from arche import *
[2]:
a = Arche("381798/1/4", schema="https://raw.githubusercontent.com/scrapinghub/arche/master/docs/source/nbs/data/books.json", target="381798/1/3")
[3]:
a.source_items.df.head()
[3]:
title | price | category | description | |
---|---|---|---|---|
https://app.scrapinghub.com/p/381798/1/4/item/0 | The Black Maria | £52.15 | Poetry | Praise for Aracelis Girmay:"[Girmay's] every l... |
https://app.scrapinghub.com/p/381798/1/4/item/1 | The Boys in the Boat: Nine Americans and Their... | £22.60 | Default | For readers of Laura Hillenbrand's Seabiscuit ... |
https://app.scrapinghub.com/p/381798/1/4/item/2 | The Coming Woman: A Novel Based on the Life of... | £17.93 | Default | "If you have a heart, if you have a soul, Kare... |
https://app.scrapinghub.com/p/381798/1/4/item/3 | The Dirty Little Secrets of Getting Your Dream... | £33.34 | Business | Drawing on his extensive experience evaluating... |
https://app.scrapinghub.com/p/381798/1/4/item/4 | The Requiem Red | £22.65 | Young Adult | Patient Twenty-nine.A monster roams the halls ... |
[4]:
a.report_all()
Some rules are not included in the above (more in rules), so a common case would be:
[5]:
arche.rules.duplicates.find_by(a.source_items.df, ["title", "price"]).show()
[ ]: