This repository contains the code for the paper Blind Baselines Beat Membership Inference Attacks for Foundation Models.
Membership inference (MI) attacks try to determine if a data sample was used to train a machine learning model. For foundation models trained on unknown Web data, MI attacks are often used to detect copyrighted training materials, measure test set contamination, or audit machine unlearning. Unfortunately, we find that evaluations of MI attacks for foundation models are flawed, because they sample members and non-members from different distributions. For 9 published MI evaluation datasets, we show that blind attacks—that distinguish the member and non-member distributions without looking at any trained model—outperform state-of-the-art MI attacks. Existing evaluations thus tell us nothing about membership leakage of a foundation model’s training data.
At the root of the repository, run the following to install required dependencies:
pip install -r requirements.txt
All datasets except for the Arxiv (1 month vs 1 month) dataset require no further setup, you can continue to the next step for all other datasets. (For arxiv1m_1m dataset, check this section)
Run the run_attack.py
script with the required command line arguments using the command below.
python3 run_attack.py --dataset <dataset> --attack <attack>
where <dataset>
is one of the datasets from the following list:
'wikimia', 'bookmia', 'temporal_wiki', 'temporal_arxiv', 'arxiv_1m', 'arxiv_1m_1m', 'multi_web', 'laion_mi', 'gutenberg'
and <attack>
is one of the following attacks:
date_detection
: Applicable for temporal datasetswikimia
,temporal_wiki
,temporal_arxiv
,arxiv1m
, andarxiv1m_1m
. It infers membership based on dates extracted from the text.bag_of_words
: Applicable for all datasets. It infers membership based on the bag-of-words representation of the text.greedy_selection
: Applicable for all datasets but works more efficiently on datasets with shorter text samples. Gives best results on datasets:temporal_wiki, arxiv1m_1m, multi_web, laion_mi
For example, to run the bag-of-words attack on the WikiMIA dataset, run the following command:
python3 run_attack.py --dataset WikiMIA --attack bag_of_words
To specify the FPR budget to be used to compute the TPR@x%FPR, use the fpr_budget
flag and specify the desired FPR budget. For example, to compute the TPR@5%FPR, run the following command:
python3 run_attack.py --dataset WikiMIA --attack bag_of_words --fpr_budget 5
To redo the hyper-parameter search, add the flag --hypersearch
, otherwise the bag of words attack uses the best default hyper-parameters. To plot the AUC ROC curve, add the flag --plot_roc
.
MI Dataset | Metric | Best Attack | Ours | Blind Attack Type |
---|---|---|---|---|
Temporal Shifted Datasets | ||||
WikiMIA | TPR@5%FPR | 43.2 | 94.7 | bag_of_words |
AUCROC | 83.9 | 99 | bag_of_words |
|
BookMIA | TPR@5%FPR | 33.6 | 64.5 | bag_of_words |
AUCROC | 88 | 91.4 | bag_of_words |
|
Temporal Wiki | TPR@1%FPR | 36.5 | greedy_selection |
|
AUCROC | 79.6 | 79.9 | greedy_selection |
|
Temporal Arxiv | TPR@1%FPR | 9.1 | bag_of_words |
|
AUCROC | 74.5 | 75.3 | bag_of_words |
|
Arxiv | TPR@1%FPR | 5.9 | 10.6 | date_detection |
(all vs 1 month) | AUCROC | 67.8 | 72.3 | date_detection |
Arxiv | TPR@1%FPR | 2.5 | 2.7 | greedy_selection |
(1 month vs 1 month) | ||||
Biased Replication | ||||
Multi-Web | TPR@1%FPR | 40.3 | 93 | greedy_selection |
AUCROC | 81.7 | 98 | bag_of_words |
|
LAION-MI | TPR@1%FPR | 2.5 | 8.9 | greedy_selection |
Gutenberg | TPR@1%FPR | 18.8 | 55.1 | greedy_selection |
AUCROC | 85.6 | 96.1 | bag_of_words |
We handle this dataset separately because it is too big to push to the repository. Here are trhe steps to extract the dataset:
- Download the whole arxiv dataset from here.
- Run the data extracion script which will save the processed dataset in the arxiv1m_1m folder.
python3 data_script_1m_1m.py --path <path to the downloaded jsonl file>
- Run the attack on the dataset using the command below:
python3 run_attack.py --dataset arxiv1m_1m --attack greedy_selection
If you use our attacks in your work, please consider citing our paper:
@misc{das2024blindbaselinesbeatmembership,
title={Blind Baselines Beat Membership Inference Attacks for Foundation Models},
author={Debeshee Das and Jie Zhang and Florian Tramèr},
year={2024},
eprint={2406.16201},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2406.16201},
}