TrojLLM [Paper]

This repository contains code for our NeurIPS 2023 paper "TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models". In this paper, we propose TrojLLM, an automatic and black-box framework to effectively generate universal and stealthy triggers and inserts trojans into the hard prompts of LLM-based APIs.

Overview

The workflow of TrojLLM.

Environment Setup

Our codebase requires the following Python and PyTorch versions:
Python --> 3.11.3
PyTorch --> 2.0.1

Usage

We have split the code into three parts:

PromptSeed/ : Prompt Seed Tuning
Trigger/ : Universal Trigger Optimization
ProgressiveTuning/ : Progressive Prompt Poisoning

These three parts correspond to the three methods we proposed in our paper. Please refer to the corresponding folder for more details.

Citation

If you find TrojLLM useful or relevant to your project and research, please kindly cite our paper:

@article{xue2024trojllm,
  title={Trojllm: A black-box trojan prompt attack on large language models},
  author={Xue, Jiaqi and Zheng, Mengxin and Hua, Ting and Shen, Yilin and Liu, Yepeng and B{\"o}l{\"o}ni, Ladislau and Lou, Qian},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
ProgressiveTuning		ProgressiveTuning
PromptSeed		PromptSeed
Trigger		Trigger
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
heatmap.py		heatmap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrojLLM [Paper]

Overview

Environment Setup

Usage

Citation

About

Releases

Packages

Contributors 2

Languages

License

UCF-ML-Research/TrojLLM

Folders and files

Latest commit

History

Repository files navigation

TrojLLM [Paper]

Overview

Environment Setup

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages