forked from morganstanley/MSML
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Pull request morganstanley#10: add code for paper uttack
Merge in MSML_PAPERS/papers from ~SONGZZ/papers:main to main * commit 'fda3e6d3652a112669fb996aff4bd595a9fe2897': add code for paper uttack
- Loading branch information
Showing
21 changed files
with
3,710 additions
and
0 deletions.
There are no files selected for viewing
145 changes: 145 additions & 0 deletions
145
papers/Existence_Trojaned_Twin_Model_UTTAttack/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
# On the Existence of a Trojaned Twin Model | ||
|
||
Author : Songzhu Zheng*, Yikai Zhang*, Lu Pang*, Weimin Lyu, Mayank Goswami, Anderson Schneider, Yuriy Nevmyvaka, Haibin Ling, Chao Chen (* equal | ||
contribution) | ||
|
||
|
||
## Abstract | ||
|
||
We study the Trojan Attack problem, where malicious attackers sabotage deep neural network models with poisoned training data. In most existing works, the effectiveness of the attack is largely overlooked; many attacks can be ineffective or inefficient for certain training schemes, e.g., adversarial training. | ||
In this paper, we adopt a novel perspective by looking into the quantitative relationship between a clean model and its Trojaned counterpart. We formulate a successful attack using classic machine learning language, namely a universal Trojan trigger intrinsic to the data distribution. Theoretically, we prove that, under mild assumptions, there exists a Trojaned model, {named Trojaned Twin}, that is very close to the clean model in the output space. Practically, we show that these results have powerful implications since the Trojaned twin model has enhanced attack efficacy and strong resiliency against detection. Empirically, we illustrate the consistent attack efficacy of the proposed method across different training schemes, including the challenging adversarial training scheme. Furthermore, we show that this Trojaned twin model is robust against SoTA detection methods. | ||
|
||
![pipeline_demo](./images/demo.png) | ||
|
||
|
||
## Publications | ||
|
||
Published at [[ICLR 2023 Workshop BANDS]](https://openreview.net/pdf?id=kwICnhvbyG) | ||
|
||
|
||
## Data | ||
|
||
* CIFAR10: [[Download]](https://www.cs.toronto.edu/~kriz/cifar.html) | ||
* GTSRB: [[Download]](https://benchmark.ini.rub.de/gtsrb_news.html) | ||
* ImageNet: [[Download]](https://www.image-net.org/download.php) | ||
* PSCAL: [[Download]](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) | ||
|
||
Put download dataset in folder `./data` to run the real-world experiments. Note: for CIFAR10 and GTSRB turn download=True in the dataloader to automatically download the dataset. | ||
|
||
To generate the downsampled 10-class ImageNet dataset, change the data folder path in `./data/ImageNet.py`, then run command: | ||
```sh | ||
python ./data/ImageNet.py | ||
``` | ||
the generated downsampled .h5 file will be stored to `./data` by default. | ||
|
||
## Code | ||
|
||
### Layout | ||
|
||
├── README.md | ||
├── experiment_configuration.yml # default configure | ||
├── run_attack.py # entry point | ||
├── trainer.py # training infrastructure | ||
├── network.py # network architecture definition | ||
├── attacker | ||
├── attacker.py # base attacker class | ||
├── badnet.py # BadNet baseline (Gu et al., 2017) | ||
├── sig.py # SIG baseline (Barni et al., 2019) | ||
├── ref.py # Reflection baseline (Liu et al., 2020) | ||
├── warp.py # WaNet baseline (Nguyen and Tran, 2020) | ||
├── imc.py # IMC baseline (Pang et al., 2020) | ||
└── utt.py # UTT attack our method | ||
├── data | ||
├── CIFAR.py # CIFAR10 dataset class | ||
├── GTSRB.py # GTSRB dataset class | ||
├── ImageNet.py # ImageNet dataset class | ||
├── PASCAL.py # PASCAL dataset class | ||
├── data_builder.py # unified dataset class | ||
└── data_utils.py # dataset building helper functions | ||
├── images | ||
└── demo.pny # readme file demo image | ||
└── requirements.txt # environment setup file | ||
|
||
|
||
### Setup | ||
|
||
We conduct all our experiments using Python 3.10. We execute our program on Red | ||
Hat Enterprise Linux Server 7.9 (Maipo) and use NVIDIA V100 GPU with cuda version 12.3. | ||
|
||
The environment setup for Learn_to_Abstain is listed in requirements.txt. To install, run: | ||
|
||
```sh | ||
python -m venv utt_attack | ||
source ./utt_attack/bin/activate | ||
pip install -r ./requirements.txt | ||
``` | ||
|
||
### Execution | ||
|
||
Experiment default configuration can be found at | ||
`experiment_configuration.yml`. Argument can also be modified through | ||
the command line: | ||
|
||
```sh | ||
python run_attack.py | ||
[--method] | ||
[--dataset] | ||
[--network] | ||
[--inject_ratio] | ||
[--budget] | ||
[--surrogate] | ||
[--surrogate_ckpt] | ||
[--xi] | ||
[--gpus] | ||
[--savedir] | ||
[--logdir] | ||
[--seed] | ||
``` | ||
|
||
* Support methods are {badnet, sig, reg, warp, imc, utt}. | ||
* Support datasets are {cifar10, gtsrb, imagenet}. | ||
* Support networks are {resnet18, resnet34, vgg16, vgg19, densenet121, inceptionv3}. | ||
|
||
For example, to test BadNet attack with CIFAR10, 10\% injection ratio, trigger size 5 and ResNet18 as victime network, run following command: | ||
```sh | ||
python run_attack.py --method badnet --dataset cifar10 --network resnet18 --budget 5 --inject_ratio 0.2 | ||
``` | ||
|
||
Another example, to test UTT with GTSRB with 1\% injection ratio, trigger size 2, ResNet18 as surrogate model, VGG16 as victim network and upscale the attack strength during testing time by factor $\xi=2$, run following command: | ||
```sh | ||
# Step I: train clean surrogate model (surrogate network can be different from victim) | ||
python run_attack.py --dataset gtsrb --network resnet18 --inject_ratio 0 --budget 0 --ckptdir ./clean_models | ||
# Step II: attack and test performance | ||
python run_attack.py --method utt --dataset gtsrb --network vgg16 --inject_ratio 0.01 --budget 2 --surrogate resnet18 --xi 2 --surrogate_ckpt ./clean_models/gtsrb_resnet18_badnet_77_True_True_False_False_240524155809.pth | ||
``` | ||
|
||
Results will be saved to `./result` if not specified. | ||
|
||
Note: method `ref` and `utt` requires surrogate model as input. | ||
|
||
|
||
## Citations | ||
|
||
If you find this code useful in your research please cite: | ||
|
||
``` | ||
@article{learn_to_abstain, | ||
title={On the Existence of a Trojaned Twin Model}, | ||
author={Songzhu Zheng, Yikai Zhang, Lu Pang, Weimin Lyu, Mayank Goswami, Anderson Schneider, Yuriy Nevmyvaka, Haibin Ling, Chao Chen}, | ||
journal={ICLR2023 Workshop BANDS}, | ||
year={2023} | ||
} | ||
``` | ||
|
||
## License | ||
|
||
All source files in this repository, unless explicitly mentioned | ||
otherwise, are released under the Apache 2.0 license, the text of | ||
which can be found in the LICENSE file. | ||
|
||
|
||
## Contact | ||
|
||
author: [[email protected]](mailto:[email protected]) | ||
|
||
Morgan Stanley Machine Learning Research: [[email protected]](mailto:[email protected]) |
115 changes: 115 additions & 0 deletions
115
papers/Existence_Trojaned_Twin_Model_UTTAttack/attacker/attacker.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
from typing import Dict, Tuple | ||
import os | ||
|
||
import torch | ||
import numpy as np | ||
from PIL import Image | ||
import pickle as pkl | ||
|
||
class Attacker(): | ||
def __init__(self, | ||
config: Dict) -> None: | ||
|
||
self.budget = config['args']['budget'] if config['args']['budget'] else config['attack']['BUDGET'] | ||
self.troj_fraction = config['args']['inject_ratio'] if config['args']['inject_ratio'] else config['attack']['INJECT_RATIO'] | ||
self.target_source_pair = config['attack']['SOURCE_TARGET_PAIR'] | ||
self.lamda = config['attack']['LAMBDA'] # transparency | ||
self.config = config | ||
|
||
self.argsdataset = self.config['args']['dataset'] | ||
self.argsnetwork = self.config['args']['network'] | ||
self.argsmethod = self.config['args']['method'] | ||
self.argsseed = self.config['args']['seed'] | ||
|
||
self.dynamic = False | ||
|
||
self.use_clip = self.config['train']['USE_CLIP'] | ||
self.use_transform = self.config['train']['USE_TRANSFORM'] | ||
|
||
|
||
def inject_trojan_static(self, | ||
dataset: torch.utils.data.Dataset, | ||
xi: float = 1, | ||
mode='train', | ||
**kwargs) -> None: | ||
|
||
# we can only add trigger on image before transformation | ||
dataset.use_transform = False | ||
if mode=='train': | ||
poison_rate = self.troj_fraction | ||
else: | ||
poison_rate = 1 | ||
|
||
if not hasattr(self, 'trigger'): | ||
self._generate_trigger() | ||
|
||
dataloader = torch.utils.data.DataLoader(dataset, batch_size=1) | ||
|
||
imgs_troj, labels_clean, labels_troj = [], [], [] | ||
|
||
for s in self.target_source_pair: | ||
|
||
count = 0 | ||
for b, (ind, img, labels_c, _) in enumerate(dataloader): | ||
|
||
if int(labels_c) == s: | ||
if count < int(poison_rate*len(dataset)//self.config['dataset'][self.argsdataset]['NUM_CLASSES']): | ||
img_troj = self._add_trigger(img.squeeze().permute(1,2,0).numpy(), label=s, xi=xi) | ||
|
||
if self.use_clip: | ||
img_troj = np.clip(img_troj, 0, 1) | ||
|
||
if len(img_troj.shape)!=4: | ||
img_troj = np.expand_dims(img_troj, axis=0) | ||
|
||
imgs_troj.append(img_troj) | ||
labels_clean.append(int(labels_c)) | ||
labels_troj.append(self.target_source_pair[int(labels_c)]) | ||
count += 1 | ||
|
||
imgs_troj = [Image.fromarray(np.uint8(imgs_troj[i].squeeze()*255)) for i in range(len(imgs_troj))] | ||
labels_clean = np.array(labels_clean) | ||
labels_troj = np.array(labels_troj) | ||
|
||
print(f"Clean Data Num {len(dataset)}") | ||
print(f"Troj Data Num {len(imgs_troj)}") | ||
|
||
dataset.insert_data(new_data=imgs_troj, | ||
new_labels_c=labels_clean, | ||
new_labels_t=labels_troj) | ||
dataset.use_transform = self.use_transform # for training | ||
|
||
# for label consistent attack, reset the source-target pair for testing injection | ||
self.target_source_pair = self.config['attack']['SOURCE_TARGET_PAIR'] | ||
for s, t in self.target_source_pair.items(): | ||
if t in self.trigger: | ||
self.trigger[s] = self.trigger[t] | ||
|
||
|
||
def inject_trojan_dynamic(self, | ||
img: torch.tensor, | ||
imgs_ind, | ||
**kwargs) -> Tuple[torch.tensor, torch.tensor, torch.tensor]: | ||
raise NotImplementedError | ||
|
||
|
||
def _generate_trigger(self) -> np.ndarray: | ||
raise NotImplementedError | ||
|
||
|
||
def _add_trigger(self) -> np.ndarray: | ||
raise NotImplementedError | ||
|
||
|
||
def save_trigger(self, path: str) -> None: | ||
os.makedirs(path, exist_ok=True) | ||
if hasattr(self, 'trigger'): | ||
for k in self.trigger: | ||
if len(self.trigger[k]): | ||
trigger_file = f"{self.argsdataset}_{self.argsnetwork}_{self.argsmethod}_source{k}_size{self.budget}_seed{self.argsseed}.pkl" | ||
with open(os.path.join(path, trigger_file), 'wb') as f: | ||
pkl.dump(self.trigger, f) | ||
f.close() | ||
else: | ||
raise AttributeError("Triggers haven't been generated !") | ||
|
47 changes: 47 additions & 0 deletions
47
papers/Existence_Trojaned_Twin_Model_UTTAttack/attacker/badnet.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
from collections import defaultdict | ||
|
||
import numpy as np | ||
|
||
from .attacker import Attacker | ||
|
||
class BadNet(Attacker): | ||
|
||
def __init__(self, **kwargs) -> None: | ||
super().__init__(**kwargs) | ||
|
||
self.trigger_w = int(self.config['attack']['badnet']['TRIGGER_SHAPE']) | ||
|
||
def _add_trigger(self, | ||
img: np.ndarray, | ||
label: int, | ||
xi: float, | ||
**kwargs) -> np.ndarray: | ||
|
||
pos = np.random.choice(['topleft', 'topright', 'bottomleft', 'bottomright'], 1, replace=False) | ||
|
||
trigger_w = min(self.trigger_w, min(img.shape[0], img.shape[1])) | ||
if pos=='topleft': | ||
h_s, h_e = 0, trigger_w | ||
w_s, w_e = 0, trigger_w | ||
elif pos=='topright': | ||
h_s, h_e = img.shape[0]-trigger_w, img.shape[0] | ||
w_s, w_e = 0, trigger_w | ||
elif pos=='bottomleft': | ||
h_s, h_e = 0, trigger_w | ||
w_s, w_e = img.shape[1]-trigger_w, img.shape[1] | ||
else: # pos='bottomright' | ||
h_s, h_e = img.shape[0]-trigger_w, img.shape[0] | ||
w_s, w_e = img.shape[1]-trigger_w, img.shape[1] | ||
|
||
self.content = np.zeros(img.shape, dtype=np.float32) | ||
self.content[h_s:h_e, w_s:w_e] = self.trigger[label] | ||
|
||
return (1-self.lamda)*img + self.lamda*xi*self.content | ||
|
||
def _generate_trigger(self) -> None: | ||
# random pattern trigger | ||
self.trigger = defaultdict(np.ndarray) | ||
for k in self.config['attack']['SOURCE_TARGET_PAIR']: | ||
self.trigger[k] = np.random.uniform(0, 1, 3*self.trigger_w**2).reshape(self.trigger_w, self.trigger_w, 3) | ||
self.trigger[k] *= self.budget/(np.linalg.norm(self.trigger[k].reshape(3, -1), ord='fro')+1e-4) #L2 norm constrain | ||
|
Oops, something went wrong.