Skip to content

UCF-ML-Research/TrojLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrojLLM [Paper]

This repository contains code for our NeurIPS 2023 paper "TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models". In this paper, we propose TrojLLM, an automatic and black-box framework to effectively generate universal and stealthy triggers and inserts trojans into the hard prompts of LLM-based APIs.

Overview

The workflow of TrojLLM. detector

Environment Setup

Our codebase requires the following Python and PyTorch versions:
Python --> 3.11.3
PyTorch --> 2.0.1

Usage

We have split the code into three parts:

  1. PromptSeed/ : Prompt Seed Tuning
  2. Trigger/ : Universal Trigger Optimization
  3. ProgressiveTuning/ : Progressive Prompt Poisoning

These three parts correspond to the three methods we proposed in our paper. Please refer to the corresponding folder for more details.

Citation

If you find TrojLLM useful or relevant to your project and research, please kindly cite our paper:

@article{xue2024trojllm,
  title={Trojllm: A black-box trojan prompt attack on large language models},
  author={Xue, Jiaqi and Zheng, Mengxin and Hua, Ting and Shen, Yilin and Liu, Yepeng and B{\"o}l{\"o}ni, Ladislau and Lou, Qian},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages