Skip to content

Commit

Permalink
Pull request morganstanley#10: add code for paper uttack
Browse files Browse the repository at this point in the history
Merge in MSML_PAPERS/papers from ~SONGZZ/papers:main to main

* commit 'fda3e6d3652a112669fb996aff4bd595a9fe2897':
  add code for paper uttack
  • Loading branch information
Songzhu Zheng authored and yikaims committed Jun 5, 2024
2 parents 779e8ec + fda3e6d commit 3202fe9
Show file tree
Hide file tree
Showing 21 changed files with 3,710 additions and 0 deletions.
145 changes: 145 additions & 0 deletions papers/Existence_Trojaned_Twin_Model_UTTAttack/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# On the Existence of a Trojaned Twin Model

Author : Songzhu Zheng*, Yikai Zhang*, Lu Pang*, Weimin Lyu, Mayank Goswami, Anderson Schneider, Yuriy Nevmyvaka, Haibin Ling, Chao Chen (* equal
contribution)


## Abstract

We study the Trojan Attack problem, where malicious attackers sabotage deep neural network models with poisoned training data. In most existing works, the effectiveness of the attack is largely overlooked; many attacks can be ineffective or inefficient for certain training schemes, e.g., adversarial training.
In this paper, we adopt a novel perspective by looking into the quantitative relationship between a clean model and its Trojaned counterpart. We formulate a successful attack using classic machine learning language, namely a universal Trojan trigger intrinsic to the data distribution. Theoretically, we prove that, under mild assumptions, there exists a Trojaned model, {named Trojaned Twin}, that is very close to the clean model in the output space. Practically, we show that these results have powerful implications since the Trojaned twin model has enhanced attack efficacy and strong resiliency against detection. Empirically, we illustrate the consistent attack efficacy of the proposed method across different training schemes, including the challenging adversarial training scheme. Furthermore, we show that this Trojaned twin model is robust against SoTA detection methods.

![pipeline_demo](./images/demo.png)


## Publications

Published at [[ICLR 2023 Workshop BANDS]](https://openreview.net/pdf?id=kwICnhvbyG)


## Data

* CIFAR10: [[Download]](https://www.cs.toronto.edu/~kriz/cifar.html)
* GTSRB: [[Download]](https://benchmark.ini.rub.de/gtsrb_news.html)
* ImageNet: [[Download]](https://www.image-net.org/download.php)
* PSCAL: [[Download]](https://pjreddie.com/projects/pascal-voc-dataset-mirror/)

Put download dataset in folder `./data` to run the real-world experiments. Note: for CIFAR10 and GTSRB turn download=True in the dataloader to automatically download the dataset.

To generate the downsampled 10-class ImageNet dataset, change the data folder path in `./data/ImageNet.py`, then run command:
```sh
python ./data/ImageNet.py
```
the generated downsampled .h5 file will be stored to `./data` by default.

## Code

### Layout

├── README.md
├── experiment_configuration.yml # default configure
├── run_attack.py # entry point
├── trainer.py # training infrastructure
├── network.py # network architecture definition
├── attacker
├── attacker.py # base attacker class
├── badnet.py # BadNet baseline (Gu et al., 2017)
├── sig.py # SIG baseline (Barni et al., 2019)
├── ref.py # Reflection baseline (Liu et al., 2020)
├── warp.py # WaNet baseline (Nguyen and Tran, 2020)
├── imc.py # IMC baseline (Pang et al., 2020)
└── utt.py # UTT attack our method
├── data
├── CIFAR.py # CIFAR10 dataset class
├── GTSRB.py # GTSRB dataset class
├── ImageNet.py # ImageNet dataset class
├── PASCAL.py # PASCAL dataset class
├── data_builder.py # unified dataset class
└── data_utils.py # dataset building helper functions
├── images
└── demo.pny # readme file demo image
└── requirements.txt # environment setup file


### Setup

We conduct all our experiments using Python 3.10. We execute our program on Red
Hat Enterprise Linux Server 7.9 (Maipo) and use NVIDIA V100 GPU with cuda version 12.3.

The environment setup for Learn_to_Abstain is listed in requirements.txt. To install, run:

```sh
python -m venv utt_attack
source ./utt_attack/bin/activate
pip install -r ./requirements.txt
```

### Execution

Experiment default configuration can be found at
`experiment_configuration.yml`. Argument can also be modified through
the command line:

```sh
python run_attack.py
[--method]
[--dataset]
[--network]
[--inject_ratio]
[--budget]
[--surrogate]
[--surrogate_ckpt]
[--xi]
[--gpus]
[--savedir]
[--logdir]
[--seed]
```

* Support methods are {badnet, sig, reg, warp, imc, utt}.
* Support datasets are {cifar10, gtsrb, imagenet}.
* Support networks are {resnet18, resnet34, vgg16, vgg19, densenet121, inceptionv3}.

For example, to test BadNet attack with CIFAR10, 10\% injection ratio, trigger size 5 and ResNet18 as victime network, run following command:
```sh
python run_attack.py --method badnet --dataset cifar10 --network resnet18 --budget 5 --inject_ratio 0.2
```

Another example, to test UTT with GTSRB with 1\% injection ratio, trigger size 2, ResNet18 as surrogate model, VGG16 as victim network and upscale the attack strength during testing time by factor $\xi=2$, run following command:
```sh
# Step I: train clean surrogate model (surrogate network can be different from victim)
python run_attack.py --dataset gtsrb --network resnet18 --inject_ratio 0 --budget 0 --ckptdir ./clean_models
# Step II: attack and test performance
python run_attack.py --method utt --dataset gtsrb --network vgg16 --inject_ratio 0.01 --budget 2 --surrogate resnet18 --xi 2 --surrogate_ckpt ./clean_models/gtsrb_resnet18_badnet_77_True_True_False_False_240524155809.pth
```

Results will be saved to `./result` if not specified.

Note: method `ref` and `utt` requires surrogate model as input.


## Citations

If you find this code useful in your research please cite:

```
@article{learn_to_abstain,
title={On the Existence of a Trojaned Twin Model},
author={Songzhu Zheng, Yikai Zhang, Lu Pang, Weimin Lyu, Mayank Goswami, Anderson Schneider, Yuriy Nevmyvaka, Haibin Ling, Chao Chen},
journal={ICLR2023 Workshop BANDS},
year={2023}
}
```

## License

All source files in this repository, unless explicitly mentioned
otherwise, are released under the Apache 2.0 license, the text of
which can be found in the LICENSE file.


## Contact

author: [[email protected]](mailto:[email protected])

Morgan Stanley Machine Learning Research: [[email protected]](mailto:[email protected])
115 changes: 115 additions & 0 deletions papers/Existence_Trojaned_Twin_Model_UTTAttack/attacker/attacker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
from typing import Dict, Tuple
import os

import torch
import numpy as np
from PIL import Image
import pickle as pkl

class Attacker():
def __init__(self,
config: Dict) -> None:

self.budget = config['args']['budget'] if config['args']['budget'] else config['attack']['BUDGET']
self.troj_fraction = config['args']['inject_ratio'] if config['args']['inject_ratio'] else config['attack']['INJECT_RATIO']
self.target_source_pair = config['attack']['SOURCE_TARGET_PAIR']
self.lamda = config['attack']['LAMBDA'] # transparency
self.config = config

self.argsdataset = self.config['args']['dataset']
self.argsnetwork = self.config['args']['network']
self.argsmethod = self.config['args']['method']
self.argsseed = self.config['args']['seed']

self.dynamic = False

self.use_clip = self.config['train']['USE_CLIP']
self.use_transform = self.config['train']['USE_TRANSFORM']


def inject_trojan_static(self,
dataset: torch.utils.data.Dataset,
xi: float = 1,
mode='train',
**kwargs) -> None:

# we can only add trigger on image before transformation
dataset.use_transform = False
if mode=='train':
poison_rate = self.troj_fraction
else:
poison_rate = 1

if not hasattr(self, 'trigger'):
self._generate_trigger()

dataloader = torch.utils.data.DataLoader(dataset, batch_size=1)

imgs_troj, labels_clean, labels_troj = [], [], []

for s in self.target_source_pair:

count = 0
for b, (ind, img, labels_c, _) in enumerate(dataloader):

if int(labels_c) == s:
if count < int(poison_rate*len(dataset)//self.config['dataset'][self.argsdataset]['NUM_CLASSES']):
img_troj = self._add_trigger(img.squeeze().permute(1,2,0).numpy(), label=s, xi=xi)

if self.use_clip:
img_troj = np.clip(img_troj, 0, 1)

if len(img_troj.shape)!=4:
img_troj = np.expand_dims(img_troj, axis=0)

imgs_troj.append(img_troj)
labels_clean.append(int(labels_c))
labels_troj.append(self.target_source_pair[int(labels_c)])
count += 1

imgs_troj = [Image.fromarray(np.uint8(imgs_troj[i].squeeze()*255)) for i in range(len(imgs_troj))]
labels_clean = np.array(labels_clean)
labels_troj = np.array(labels_troj)

print(f"Clean Data Num {len(dataset)}")
print(f"Troj Data Num {len(imgs_troj)}")

dataset.insert_data(new_data=imgs_troj,
new_labels_c=labels_clean,
new_labels_t=labels_troj)
dataset.use_transform = self.use_transform # for training

# for label consistent attack, reset the source-target pair for testing injection
self.target_source_pair = self.config['attack']['SOURCE_TARGET_PAIR']
for s, t in self.target_source_pair.items():
if t in self.trigger:
self.trigger[s] = self.trigger[t]


def inject_trojan_dynamic(self,
img: torch.tensor,
imgs_ind,
**kwargs) -> Tuple[torch.tensor, torch.tensor, torch.tensor]:
raise NotImplementedError


def _generate_trigger(self) -> np.ndarray:
raise NotImplementedError


def _add_trigger(self) -> np.ndarray:
raise NotImplementedError


def save_trigger(self, path: str) -> None:
os.makedirs(path, exist_ok=True)
if hasattr(self, 'trigger'):
for k in self.trigger:
if len(self.trigger[k]):
trigger_file = f"{self.argsdataset}_{self.argsnetwork}_{self.argsmethod}_source{k}_size{self.budget}_seed{self.argsseed}.pkl"
with open(os.path.join(path, trigger_file), 'wb') as f:
pkl.dump(self.trigger, f)
f.close()
else:
raise AttributeError("Triggers haven't been generated !")

47 changes: 47 additions & 0 deletions papers/Existence_Trojaned_Twin_Model_UTTAttack/attacker/badnet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
from collections import defaultdict

import numpy as np

from .attacker import Attacker

class BadNet(Attacker):

def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)

self.trigger_w = int(self.config['attack']['badnet']['TRIGGER_SHAPE'])

def _add_trigger(self,
img: np.ndarray,
label: int,
xi: float,
**kwargs) -> np.ndarray:

pos = np.random.choice(['topleft', 'topright', 'bottomleft', 'bottomright'], 1, replace=False)

trigger_w = min(self.trigger_w, min(img.shape[0], img.shape[1]))
if pos=='topleft':
h_s, h_e = 0, trigger_w
w_s, w_e = 0, trigger_w
elif pos=='topright':
h_s, h_e = img.shape[0]-trigger_w, img.shape[0]
w_s, w_e = 0, trigger_w
elif pos=='bottomleft':
h_s, h_e = 0, trigger_w
w_s, w_e = img.shape[1]-trigger_w, img.shape[1]
else: # pos='bottomright'
h_s, h_e = img.shape[0]-trigger_w, img.shape[0]
w_s, w_e = img.shape[1]-trigger_w, img.shape[1]

self.content = np.zeros(img.shape, dtype=np.float32)
self.content[h_s:h_e, w_s:w_e] = self.trigger[label]

return (1-self.lamda)*img + self.lamda*xi*self.content

def _generate_trigger(self) -> None:
# random pattern trigger
self.trigger = defaultdict(np.ndarray)
for k in self.config['attack']['SOURCE_TARGET_PAIR']:
self.trigger[k] = np.random.uniform(0, 1, 3*self.trigger_w**2).reshape(self.trigger_w, self.trigger_w, 3)
self.trigger[k] *= self.budget/(np.linalg.norm(self.trigger[k].reshape(3, -1), ord='fro')+1e-4) #L2 norm constrain

Loading

0 comments on commit 3202fe9

Please sign in to comment.