Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
javirandor authored Apr 24, 2024
1 parent 87e5342 commit bf4e970
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ This repository is a detached fork from [Safe-RLHF](https://github.com/PKU-Align

![Poisoned RLHF](imgs/poisoning_rlhf.png)

You might also want to check our competitition ["Find the Trojan: Universal Backdoor Detection in Aligned LLMs"](https://github.com/ethz-spylab/rlhf_trojan_competition), where your task is to find which are the trojans we have injected in several models!
> [!Note]
> You might also want to check our competitition ["Find the Trojan: Universal Backdoor Detection in Aligned LLMs"](https://github.com/ethz-spylab/rlhf_trojan_competition), where participants tried to find trojans in several models! All models have been open-sourced and can be used in future research.
## Abstract

Expand All @@ -30,6 +31,9 @@ jailbreak backdoors.

We opensource datasets and a set of models used in our work. They are all hosted in HuggingFace and need you to accept the conditions before downloading.

> [!Note]
> You can also use the 5 models and datasets from our competition. More details [here](https://github.com/ethz-spylab/rlhf_trojan_competition).
**Models**

| Model name | HuggingFace URL |
Expand Down

0 comments on commit bf4e970

Please sign in to comment.