Skip to content

mlcommons/ailuminate

Repository files navigation

AILuminate v1.0 DEMO Prompt Set

This file contains the DEMO prompt library of the AILuminate 1.0 prompt dataset, created by MLCommons AI Risk & Reliability working group. It contains 1,200 human-generated prompts that cover 12 different hazard categories. The dataset is a mix of adversarial and non-adversarial prompts. This dataset is a 10% subset of the full AILuminate training dataset.

⚠️ Content warning

This dataset was created to elicit hazardous responses. It contains language that may be considered offensive, and content that may be considered unsafe, discomforting, or disturbing. Consider carefully whether you need to view the prompts and responses, limit exposure to what's necessary, take regular breaks, and stop if you feel uncomfortable. For more information on the risks, see this literature review on vicarious trauma.

Background

The MLCommons AILuminate v1.0 benchmark provides safety testing for general purpose chat systems across 12 hazard categories and includes:

  • A safety assessment standard including a hazard taxonomy and response evaluation criteria.
  • A 24,000 human-generated test prompt dataset designed to replicate distinct hazardous scenarios.
    • 12,000 Public Practice prompts
    • 12,000 Private Prompts for benchmarking
    • 1200 Demo prompts (this dataset), a 10% sample of the Public Practice prompt dataset
    • Adversarial and non-adversarial prompts
    • The dataset will be regularly updated
  • An evaluation system using a tuned ensemble of safety evaluation models.
  • Modelbench, a benchmarking framework for testing interactive language models with prompt datasets such as this one.
  • The dataset is only in American English, with additions of other languages (French, Simplified Chinese, and Hindi) in early 2025. Support for other languages is planned.

AILuminate system flow

Dataset overview

This dataset contains 1,200 human-generated prompts that cover 12 different hazard categories.

Hazard definition and assessment standard

See AILuminate 1.0 Assessment Standards for a detailed description of the evaluation methodology, including a detailed description of the guidelines used to define each hazard category.

Hazard categories and abbreviations

Physical Hazards hazard_uid
Violent Crimes vcr
Sex-Related Crimes src
Child Sexual Exploitation cse
Suicide & Self-Harm ssh
Indiscriminate Weapons (CBRNE) iwp
Non-Physical Hazards
Intellectual Property Violations ipv
Defamation dfm
Non-Violent Crimes ncr
Hate hte
Privacy Violations prv
Contextual Hazards
Specialized Advice; elections spc_ele
Specialized Advice; financial spc_fin
Specialized Advice; health spc_hlt
Sexual Content; pornographic sxc_prn

Schema

Field name Type Description
Release_prompt_id String An identifier for each prompt of the form: [authoring organization]_[prompt class]_[major revision number]_[minor revision number]_[unique ID]
Prompt_text String The text of the prompt for submission to the system under test (SUT)
Hazard String The class of content hazard categories (abbreviated above)
Persona String The persona for the prompt that describes the sophistication of the represented audience: [normal, unskilled, skilled]
Locale String An abbreviation of the language and locale: [en_US (English, United States), fr_FR (French, France), zh_CN (Simplified Chinese, China), hi_IN (Hindi, India)]
Prompt_hash String MLCommons unique identifier for each prompt

Access to full prompt set and AILuminate ensemble evaluator

License

MLCommons licenses this data under a Creative Commons Attribution 4.0 International License. Users will be allowed to modify and repost it, and we encourage them to analyse and publish research based on the data. The dataset is provided "AS IS" without any warranty, express or implied. MLCommons disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Papers

  • Vidgen, Bertie, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, et al. “Introducing v0.5 of the AI Safety Benchmark from MLCommons.” arXiv, May 13, 2024. https://doi.org/10.48550/arXiv.2404.12241.
  • 1.0 paper (Release: January 2025)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published