Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Requirements
Target Model Fine-tuning
Self-prompt Reference Model Fine-tuning
Run SPV-MIA

This is the official implementation of the paper "Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration". The proposed Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA) is implemented as follows.

Requirements

torch>=1.11.0
accelerate==0.20.3
transformers==4.34.0.dev0
trl==0.7.1
datasets==2.13.1
numpy>=1.23.4
scikit-learn>=1.1.3
pyyaml>=6.0
tqdm>=4.64.1

Dependency can be installed with the following command:

pip install -r requirements.txt

Target Model Fine-tuning

All large language models (LLMs) are built on the top of transformers, a go-to library for state-of-the-art transformer models, on which you can fine-tune arbitrary well-known LLMs you want, including LLaMA, GPT-series, Falcon, etc. We recommend training LLMs with multi-GPU and accelerate, a library that enables the same PyTorch code to be run across any distributed configuration:

accelerate launch ./ft_llms/llms_finetune.py \
--output_dir ./ft_llms/*pretrained_model_name*/*dataset_name*/target/ \
--block_size 128 --eval_steps 100 --save_epochs 100 --log_steps 100 \
-d *dataset_name* -m *pretrained_model_name* --packing --use_dataset_cache \
-e 10 -b 4 -lr 1e-4 --gradient_accumulation_steps 1 \
--train_sta_idx=0 --train_end_idx=10000 --eval_sta_idx=0 --eval_end_idx=1000

Please replace *pretrained_model_name* and *dataset_name* with the names of pretrained LLM and training dataset, such as decapoda-research/llama-7b-hf and ag_news.

Recommended pretrained models

GPT-2 (https://huggingface.co/gpt2)
GPT-J (https://huggingface.co/EleutherAI/gpt-j-6b)
Falcon (https://huggingface.co/tiiuae/falcon-7b)
LLaMA (https://huggingface.co/decapoda-research/llama-7b-hf) ¹

Recommended datasets

Ag News (https://huggingface.co/datasets/ag_news)
Wikitext-103 (https://huggingface.co/datasets/wikitext) ²
Xsum (https://huggingface.co/datasets/xsum)

Self-prompt Reference Model Fine-tuning

Before fine-tuning the self-prompt reference model, the reference dataset can be sampled via our proposed self-prompt approach over the fine-tuned LLM.

accelerate launch refer_data_generate.py \
-tm *fine_tuned_model* \
-m *pretrained_model_name* -d *dataset_name*

Replace *fine_tuned_model* with the directory of the fine-tuned target model (i.e., the output directory of the Target Model Fine-tuning phase).

Then fine-tune the self-prompt reference model in the same manner as the target model, but with a smaller training epoch:

accelerate launch ./ft_llms/llms_finetune.py --refer \
--output_dir ./ft_llms/*pretrained_model_name*/*dataset_name*/refer/ \
--block_size 128 --eval_steps 100 --save_epochs 100 --log_steps 100 \
-d *dataset_name* -m *pretrained_model_name* --packing --use_dataset_cache \
-e 2 -b 4 -lr 5e-5 --gradient_accumulation_steps 1 \
--train_sta_idx=0 --train_end_idx=10000 --eval_sta_idx=0 --eval_end_idx=1000

Run SPV-MIA

After accomplishing the preliminary operations, here is the command for deploying SPV-MIA on the target model.

python attack.py

This third-party repo decapoda-research/llama-7b-hf seems to be deleted by unknown reasons, using forked repos luodian/llama-7b-hf or baffo32/decapoda-research-llama-7B-hf as alternatives. ↩
Please add an additional argument --dataset_config_name wikitext-2-raw-v1 to specify this dataset. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
attack		attack
configs		configs
data		data
ft_llms		ft_llms
.gitignore		.gitignore
Framework.png		Framework.png
README.md		README.md
attack.py		attack.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Requirements

Target Model Fine-tuning

Recommended pretrained models

Recommended datasets

Self-prompt Reference Model Fine-tuning

Run SPV-MIA

About

Releases

Packages

Languages

tsinghua-fib-lab/ANeurIPS2024_SPV-MIA

Folders and files

Latest commit

History

Repository files navigation

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Requirements

Target Model Fine-tuning

Recommended pretrained models

Recommended datasets

Self-prompt Reference Model Fine-tuning

Run SPV-MIA

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages