Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality facet - Nutrition - serving_size="serving" for data-quality #5163

Closed
TaciteOFF opened this issue Apr 19, 2021 · 11 comments
Closed

Quality facet - Nutrition - serving_size="serving" for data-quality #5163

TaciteOFF opened this issue Apr 19, 2021 · 11 comments
Labels
🧽 Data quality - Measure - Quality facets One of the facets available in Open Food Facts is /quality & allows us to spot products w/ bad data 🧽 Data quality https://wiki.openfoodfacts.org/Quality ✨ Feature Features or enhancements to Open Food Facts server good first issue Welcome to Open Food Facts. This issue should be approachable if you're new. Get in touch for help. osd'22 portions ⚖️ Quantity ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it.

Comments

@TaciteOFF
Copy link
Contributor

TaciteOFF commented Apr 19, 2021

What

Hello,

It seems that a few users (or applications) sometimes mistakenly indicate "serving" as the portion size.
A facet to detect and correct this could be useful.

Thanks

Bonjour,

Il semblerait que quelques utilisateurs (ou applications) indiquent parfois par erreur "serving" comme taille de portion.
Une facette pour détecter et corriger ça pourrait être utile.

Merci

Part of

@TaciteOFF TaciteOFF added ✨ Feature Features or enhancements to Open Food Facts server 🧽 Data quality - Measure - Quality facets One of the facets available in Open Food Facts is /quality & allows us to spot products w/ bad data labels Apr 19, 2021
@teolemon
Copy link
Member

teolemon commented Apr 23, 2021

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity.

@github-actions github-actions bot added the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Jul 23, 2021
@teolemon teolemon changed the title Nouvelle facette serving_size="serving" pour data-quality Quality facet - serving_size="serving" for data-quality Oct 11, 2021
@teolemon teolemon changed the title Quality facet - serving_size="serving" for data-quality Quality facet - Nutrition - serving_size="serving" for data-quality Oct 11, 2021
@stephanegigandet stephanegigandet added the good first issue Welcome to Open Food Facts. This issue should be approachable if you're new. Get in touch for help. label Mar 8, 2022
@stephanegigandet
Copy link
Contributor

corresponding code is in DataQualityFood.pm

@CharlesNepote CharlesNepote added the 🧽 Data quality https://wiki.openfoodfacts.org/Quality label Sep 27, 2022
@CharlesNepote
Copy link
Member

CharlesNepote commented Jan 20, 2023

This would lead to increase data quality errors from 5.78% to ~6.16% for an issue that is having no impact on Nutri-Score, Nova, etc. for at least more than 10%.

So I would be in favour to:

  • 1. Delete all the value "serving" in the serving_size as:
    • when users are often seeing this value they can think it's a good value
    • it's easier to complete an empty field than a completed one
    • it's not worse to have an empty field than a false one
  • 2. Create the data quality error facet to manage the new ones

Maybe 2 can be done before 1.

@benbenben2
Copy link
Collaborator

benbenben2 commented Jan 24, 2023

Create the data quality error facet to manage the new ones
@CharlesNepote Do you want only the case when serving_size="serving"?
Or do you want to generalize for all strings? (just thinking there..., we always expect some number in this field - like 20g, 20ml, for example - right? If yes, then, we could raise an error when there is no integer/number in this field. What do you think?

Just checked with Mirabelle, that could include some values like "-" (6), "une tranche" (1), "une noisette" (1), "une pression par narine" (1), "une cuillere à café" (1), "une biscotte g" (1), "tbsp" (1), "servingg" (1), etc.
But occurrence of those values (written in parenthesis) is nothing comparable to "serving" (11 199)

@CharlesNepote
Copy link
Member

@benbenben2 yes why not. There are 227 different values for this field. Most of them are given once but it would be nice to detect it.

The hyphen (-) should not be included. I think it could be a good practice to always use it when we want to precise that there is no value (null).

@stephanegigandet can you confirm there is no reason not to have at least one number in this field?

@stephanegigandet
Copy link
Contributor

It's a free form field, so we can have anything in it. We could have a warning when there isn't a number, but I wouldn't make it an error.

What would be interesting is a quality warning when we don't have a value for serving size, or we have a value that we can't convert to g/ml, AND we have nutrition facts indicated per serving. In that case, while we have nutrition facts, we can't compute Nutri-Score etc

@CharlesNepote
Copy link
Member

@stephanegigandet

It's a free form field, so we can have anything in it. We could have a warning when there isn't a number, but I wouldn't make it an error.

Do you see some examples where a value without a number would be relevant?
If we look at what the users already entered, in the list I mentioned earlier, there are:

  • "Non indiquée" (7), "Non précisé" (6), "Non précisée" (1), "Non spécifié (1)", "non communiqué" (1), "N.A" (1), "Not mentioned" (1), "No indica" (1), "nicht angegeben" (1), etc. => IMHO they should be converted to "-"
  • "tbsp" (1) or "cuillère à soupe" (1) (but we're not sure if it's "one" tbsp, so I would say it's an error)
  • to me, all the other values appear to be either errors, either values that can't be converted into a size in grams or ml.

So I would tend to conclude that:

  • either there are values which should be converted into "-" (manually or automatically)
  • either there are values that are not relevant, and, as such, should be "data quality errors"

So I would count all these values as "data quality errors" and, while we don't do it automaticaly, let people manually fix "non communiqué" into "-".

What would be interesting is a quality warning when we don't have a value for serving size, or we have a value that we can't convert to g/ml, AND we have nutrition facts indicated per serving. In that case, while we have nutrition facts, we can't compute Nutri-Score etc

At least we could do this. But honestly, currently quality warnings are not read by anyone (or so few). And there is currently no easy way to tag a warning as "checked". So for all the reasons I mentioned in this comment, I think it should be a data quality error.

@stephanegigandet
Copy link
Contributor

I think it should be a data quality error.

OK.

CharlesNepote added a commit that referenced this issue Feb 10, 2023
teolemon added a commit that referenced this issue Feb 11, 2023
* Add en:Serving size is missing digits

See #5163 and #8057.

* ci: autolabel changes to the taxo

---------

Co-authored-by: Pierre Slamich <[email protected]>
@CharlesNepote CharlesNepote moved this from To do to In progress in 🧽 Ensuring Data Quality Mar 3, 2023
@benbenben2
Copy link
Collaborator

@CharlesNepote, is it solved by this PR #8091?

@alexgarel
Copy link
Member

I think so, let's close it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧽 Data quality - Measure - Quality facets One of the facets available in Open Food Facts is /quality & allows us to spot products w/ bad data 🧽 Data quality https://wiki.openfoodfacts.org/Quality ✨ Feature Features or enhancements to Open Food Facts server good first issue Welcome to Open Food Facts. This issue should be approachable if you're new. Get in touch for help. osd'22 portions ⚖️ Quantity ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it.
Projects
Archived in project
Development

No branches or pull requests

6 participants