You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes

Jabez Magomere, Shu Ishida, Tejumade Afonja, Aya Salama, Daniel Kochin, Foutse Yuehgoh, Imane Hamzaoui, Raesetje Sefala, Aisha Aalagib, Elizaveta Semenova, Lauren Crais, Siobhan Mackenzie Hall

Official Website (used for data collection): https://worldwidedishes.com/

Read more about the project and the dataset

The World Wide Dishes Dataset

Licence and terms of use

We present the World Wide Dishes dataset which seeks to assess these disparities through a decentralised data collection effort to gather perspectives directly from people with a wide variety of backgrounds from around the globe with the aim of creating a dataset consisting of their insights into their own experiences of foods relevant to their cultural, regional, national, or ethnic lives.

The meta data of the World Wide Dishes dataset is available in the Croissant format:

World Wide Dishes Croissant metadata

The World Wide Dishes website

Link to the website used during data collection: https://worldwidedishes.com/

The website includes our Data Protection Policy and FAQs developed to support contributors during the data collection process.

Running your own instance of the website:

Please refer to the README.md in the webapp directory for instructions on how to run your own instance of the website.

Web application source code and README.md

The World Wide Dishes Experiments

In addition to World Wide Dishes dataset, we present 30 dishes for 5 selected African countries + 30 dishes for the US as a baseline. An additional test suite was curated for regional parity.

Dishes selected for the five African countries + the US
US Test set csv (same set of dishes in the previous sheet, but also includes a regional label)
Dishes selected for the five African countries + the US / US Test set (Excel Sheet)

Reproducing experiments

Setting up the Python environment

conda create -n wwd python=3.10
conda activate wwd
pip install -r requirements.txt

Create an `.env` file with settings

Create a .env file in the root directory of the repository with the following settings:

WWD_CSV_PATH=./data/WorldWideDishes_2024_June_World_Wide_Dishes.csv
WWD_30_DISHES_CSV_PATH=./data/WorldWideDishes_2024_June_Selected_Countries.csv

This points to the World Wide Dishes dataset and the 30 dishes selected for the African countries and the US.

Obtaining an OpenAI API key and Groq API key

If you want to conduct experiments that involve the use of OpenAI products such as GPT 3.5 (required for the LLM experiments), DALL-E 2 and DALL-E 3 (required for the dish image generation), please obtain the OpenAI API key from here and set it as an environment variable OPENAI_API_KEY by adding it to the .env file. (Make sure you don't commit this file to Git!)

While Llama 3 (8B) model and Llama 3 (70B) model can be run locally by first obtaining a licence through Huggingface from the links provided, running these models locally is computationally expensive and time-consuming.

Groq offers a fast and reliable API service for open-sourced LLMs, including Llama 3 models. As of June 2024, the Groq API is free to use. Please obtain the Groq API key from here and set it as an environment variable GROQ_API_KEY by adding it to the .env file.

LLM Experiments to evaluate common knowledge understanding

Please refer to the README.md in the llm_probing directory for instructions on how to run the experiments.

LLM Probing source code and README.md

Code for generating images using the World Wide Dishes dataset

Please refer to the README.md in the gen_images directory for instructions on how to run the experiments.

Image Generation source code and README.md

CLIP Experiments to evaluate association of generated images with positive and negative descriptors

Please refer to the README.md in the clip_probing directory for instructions on how to run the experiments.

CLIP Probing source code and README.md

VQA Experiments to probe generated outputs for potential biases

Please refer to the README.md in the vqa directory for instructions on how to run the experiments.

VQA source code and README.md

Community Review of generated images

Due to the high degree of inaccurate and culturally insensitve imagery we will not be releasing the generated images for safety reasons. Our terms of use also prohibits the generation of images for trainign models using the World Wide Dishes dataset.

For transparency and insight into the review conducted, we are releasing the text responses only:

Community Reviews

Contributor-submitted CC-licenced dish images

In the World Wide Dishes dataset, we have a column uploaded_image_name that contains paths to dish images that we have contributed and are CC-licenced. This is a subset of the images that were contributed to the data collection website. We only include those images that we were personally able to verify as being owned by the contributor. We have uploaded these images to a Google Drive folder for public access.

Contributor-submitted dish images on Google Drive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes

The World Wide Dishes Dataset

The World Wide Dishes website

Running your own instance of the website:

The World Wide Dishes Experiments

Reproducing experiments

Setting up the Python environment

Create an `.env` file with settings

Obtaining an OpenAI API key and Groq API key

LLM Experiments to evaluate common knowledge understanding

Code for generating images using the World Wide Dishes dataset

CLIP Experiments to evaluate association of generated images with positive and negative descriptors

VQA Experiments to probe generated outputs for potential biases

Community Review of generated images

Contributor-submitted CC-licenced dish images

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
clip_probing		clip_probing
community		community
data		data
gen_images		gen_images
llm_probing		llm_probing
vqa		vqa
webapp		webapp
.env.example		.env.example
.gitignore		.gitignore
LICENCE.md		LICENCE.md
README.md		README.md
croissant-worldwidedishes.json		croissant-worldwidedishes.json
requirements.txt		requirements.txt

License

oxai/world-wide-dishes

Folders and files

Latest commit

History

Repository files navigation

You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes

The World Wide Dishes Dataset

The World Wide Dishes website

Running your own instance of the website:

The World Wide Dishes Experiments

Reproducing experiments

Setting up the Python environment

Create an .env file with settings

Obtaining an OpenAI API key and Groq API key

LLM Experiments to evaluate common knowledge understanding

Code for generating images using the World Wide Dishes dataset

CLIP Experiments to evaluate association of generated images with positive and negative descriptors

VQA Experiments to probe generated outputs for potential biases

Community Review of generated images

Contributor-submitted CC-licenced dish images

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Create an `.env` file with settings

Packages