AILuminate v1.0 DEMO Prompt Set

This file contains the DEMO prompt library of the AILuminate 1.0 prompt dataset, created by MLCommons AI Risk & Reliability working group. It contains 1,200 human-generated prompts that cover 12 different hazard categories. The dataset is a mix of adversarial and non-adversarial prompts. This dataset is a 10% subset of the full AILuminate training dataset.

⚠️ Content warning

This dataset was created to elicit hazardous responses. It contains language that may be considered offensive, and content that may be considered unsafe, discomforting, or disturbing. Consider carefully whether you need to view the prompts and responses, limit exposure to what's necessary, take regular breaks, and stop if you feel uncomfortable. For more information on the risks, see this literature review on vicarious trauma.

Background

The MLCommons AILuminate v1.0 benchmark provides safety testing for general purpose chat systems across 12 hazard categories and includes:

A safety assessment standard including a hazard taxonomy and response evaluation criteria.
A 24,000 human-generated test prompt dataset designed to replicate distinct hazardous scenarios.
- 12,000 Public Practice prompts
- 12,000 Private Prompts for benchmarking
- 1200 Demo prompts (this dataset), a 10% sample of the Public Practice prompt dataset
- Adversarial and non-adversarial prompts
- The dataset will be regularly updated
An evaluation system using a tuned ensemble of safety evaluation models.
Modelbench, a benchmarking framework for testing interactive language models with prompt datasets such as this one.
The dataset is only in American English, with additions of other languages (French, Simplified Chinese, and Hindi) in early 2025. Support for other languages is planned.

Dataset overview

This dataset contains 1,200 human-generated prompts that cover 12 different hazard categories.

Hazard definition and assessment standard

See AILuminate 1.0 Assessment Standards for a detailed description of the evaluation methodology, including a detailed description of the guidelines used to define each hazard category.

Hazard categories and abbreviations

Physical Hazards	hazard_uid
Violent Crimes	vcr
Sex-Related Crimes	src
Child Sexual Exploitation	cse
Suicide & Self-Harm	ssh
Indiscriminate Weapons (CBRNE)	iwp

Non-Physical Hazards
Intellectual Property Violations	ipv
Defamation	dfm
Non-Violent Crimes	ncr
Hate	hte
Privacy Violations	prv

Contextual Hazards
Specialized Advice; elections	spc_ele
Specialized Advice; financial	spc_fin
Specialized Advice; health	spc_hlt
Sexual Content; pornographic	sxc_prn

Schema

Field name	Type	Description
Release_prompt_id	String	An identifier for each prompt of the form: [authoring organization]_[prompt class]_[major revision number]_[minor revision number]_[unique ID]
Prompt_text	String	The text of the prompt for submission to the system under test (SUT)
Hazard	String	The class of content hazard categories (abbreviated above)
Persona	String	The persona for the prompt that describes the sophistication of the represented audience: [normal, unskilled, skilled]
Locale	String	An abbreviation of the language and locale: [en_US (English, United States), fr_FR (French, France), zh_CN (Simplified Chinese, China), hi_IN (Hindi, India)]
Prompt_hash	String	MLCommons unique identifier for each prompt

Access to full prompt set and AILuminate ensemble evaluator

If you would like access to the full 12,000 prompt Practice Set and to test a system using MLCommons state-of-the-art ensemble evaluator model and our 12,000 test set, please complete this form.
More information on the AILuminate website.
Participate in MLCommons’ AI Risk & Reliability working group.

License

MLCommons licenses this data under a Creative Commons Attribution 4.0 International License. Users will be allowed to modify and repost it, and we encourage them to analyse and publish research based on the data. The dataset is provided "AS IS" without any warranty, express or implied. MLCommons disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Papers

Vidgen, Bertie, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, et al. “Introducing v0.5 of the AI Safety Benchmark from MLCommons.” arXiv, May 13, 2024. https://doi.org/10.48550/arXiv.2404.12241.
1.0 paper (Release: January 2025)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
ailuminate_flow_illustration_dec-2024.png		ailuminate_flow_illustration_dec-2024.png
airr_official_1.0_practice_prompt_set_release_public_subset.csv		airr_official_1.0_practice_prompt_set_release_public_subset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AILuminate v1.0 DEMO Prompt Set

⚠️ Content warning

Background

Dataset overview

Hazard definition and assessment standard

Hazard categories and abbreviations

Schema

Access to full prompt set and AILuminate ensemble evaluator

License

Papers

About

Releases

Packages

Contributors 6

License

mlcommons/ailuminate

Folders and files

Latest commit

History

Repository files navigation

AILuminate v1.0 DEMO Prompt Set

⚠️ Content warning

Background

Dataset overview

Hazard definition and assessment standard

Hazard categories and abbreviations

Schema

Access to full prompt set and AILuminate ensemble evaluator

License

Papers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages