Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically import prices from external grocery store APIs / Websites #512

Open
niwla23 opened this issue Oct 10, 2024 · 4 comments
Open
Labels
✨ enhancement New feature or request

Comments

@niwla23
Copy link

niwla23 commented Oct 10, 2024

I am new to OpenFoodFacts but had this idea:

Some German supermarkets have online shops where the prices are listed. For example Netto or REWE, where I know that REWE even has an (internal but public) API to search by EAN.

This could be used to easily aggregate prices into the database. I am not sure how to best implement this, following things would have to be considered:

  • Copyright (I guess that this would be okay under Fair Use or something)
  • When the data is fetched (probably a background job, maybe an external application)
  • How to not spam the APIs (maybe only download on demand and cache for a few days?)

I would be happy to work on this feature if it wanted and doable. Please let me know what you think (or if it is already being done and I did not find it).

This would be possible for these markets:

These only provide prices for special offers:

  • Kaufland
  • Lidl
@niwla23
Copy link
Author

niwla23 commented Oct 10, 2024

I think the easiest way to this would be as an external service that will just slowly loop through all products (or the top 80% of products, if such data is available. I also have no idea how many products exist but I guess its a lot).

It could then send the prices to the main API server with some form of authentication so the prices do not need to be manually verified.

@raphodn
Copy link
Member

raphodn commented Oct 11, 2024

At the moment we don't allow price scraping..

The 4 allowed price sources are :

  • price tag images in shops
  • receipt images
  • GDPR requests
  • shop imports

Why ?
https://prices.openfoodfacts.org/about

For legal and technical reasons, we don't consider scraping prices from retailers' websites as a valid way to contribute to Open Prices. We want to make sure that the prices we collect are accurate and up-to-date, and receiving scraped prices from contributors doesn't allow us to do that.
Price scraping is a considered option in a future version of Open Prices, but it would be done by Open Prices itself so that we can have a proof of the price based on the HTML page.

See also this discussion : #311

In terms of license, the Open Prices database is ODbL.
Fair Use is a US-only notion I believe. Most likely does not apply to France/EU.

We definitely need an expert eye on this in any case...

@raphodn
Copy link
Member

raphodn commented Oct 21, 2024

We recently added the "online shop" possibility in the backend : #317
If you loop on each product, take a screenshot, upload the proof, and then add the price, it could be doable ^^

@niwla23
Copy link
Author

niwla23 commented Nov 4, 2024

We recently added the "online shop" possibility in the backend : #317 If you loop on each product, take a screenshot, upload the proof, and then add the price, it could be doable ^^

That would be an option but for stores with a nice API available it would be inefficient. Taking screenshots would require running a headless browser which would 10x-100x the processing power required to get the prices. Screenshots are also not really proof of anything, anyone with the skills to parse this data knows how to manipulate it before taking the screenshot. I see the point of needing a proof, but I am not sure if this is the right way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

2 participants