Pull request morganstanley#10: add code for paper uttack

Merge in MSML_PAPERS/papers from ~SONGZZ/papers:main to main * commit 'fda3e6d3652a112669fb996aff4bd595a9fe2897': add code for paper uttack
trapexit · Jun 5, 2024 · 3202fe9 · 3202fe9
2 parents 779e8ec + fda3e6d
commit 3202fe9
Show file tree

Hide file tree

Showing 21 changed files with 3,710 additions and 0 deletions.
diff --git a/papers/Existence_Trojaned_Twin_Model_UTTAttack/README.md b/papers/Existence_Trojaned_Twin_Model_UTTAttack/README.md
@@ -0,0 +1,145 @@
+# On the Existence of a Trojaned Twin Model
+
+Author : Songzhu Zheng*, Yikai Zhang*, Lu Pang*, Weimin Lyu, Mayank Goswami, Anderson Schneider, Yuriy Nevmyvaka, Haibin Ling, Chao Chen (* equal
+contribution)
+
+
+## Abstract 
+
+We study the Trojan Attack problem, where malicious attackers sabotage deep neural network models with poisoned training data. In most existing works, the effectiveness of the attack is largely overlooked; many attacks can be ineffective or inefficient for certain training schemes, e.g., adversarial training.
+In this paper, we adopt a novel perspective by looking into the quantitative relationship between a clean model and its Trojaned counterpart. We formulate a successful attack using classic machine learning language, namely a universal Trojan trigger intrinsic to the data distribution. Theoretically, we prove that, under mild assumptions, there exists a Trojaned model, {named Trojaned Twin}, that is very close to the clean model in the output space. Practically, we show that these results have powerful implications since the Trojaned twin model has enhanced attack efficacy and strong resiliency against detection. Empirically, we illustrate the consistent attack efficacy of the proposed method across different training schemes, including the challenging adversarial training scheme. Furthermore, we show that this Trojaned twin model is robust against SoTA detection methods.
+
+![pipeline_demo](./images/demo.png)
+
+
+## Publications
+
+Published at [[ICLR 2023 Workshop BANDS]](https://openreview.net/pdf?id=kwICnhvbyG)
+
+
+## Data
+
+* CIFAR10: [[Download]](https://www.cs.toronto.edu/~kriz/cifar.html)
+* GTSRB: [[Download]](https://benchmark.ini.rub.de/gtsrb_news.html)
+* ImageNet: [[Download]](https://www.image-net.org/download.php)
+* PSCAL: [[Download]](https://pjreddie.com/projects/pascal-voc-dataset-mirror/)
+
+Put download dataset in folder `./data` to run the real-world experiments. Note: for CIFAR10 and GTSRB turn download=True in the dataloader to automatically download the dataset. 
+
+To generate the downsampled 10-class ImageNet dataset, change the data folder path in `./data/ImageNet.py`, then run command: 
+```sh
+python ./data/ImageNet.py
+```
+the generated downsampled .h5 file will be stored to `./data` by default.
+
+## Code 
+
+### Layout 
+
+    ├── README.md 
+    ├── experiment_configuration.yml     # default configure
+    ├── run_attack.py                    # entry point
+    ├── trainer.py                       # training infrastructure
+    ├── network.py                       # network architecture definition
+    ├── attacker        
+        ├── attacker.py                  # base attacker class                
+        ├── badnet.py                    # BadNet baseline (Gu et al., 2017)
+        ├── sig.py                       # SIG baseline (Barni et al., 2019)
+        ├── ref.py                       # Reflection baseline (Liu et al., 2020)
+        ├── warp.py                      # WaNet baseline (Nguyen and Tran, 2020)
+        ├── imc.py                       # IMC baseline (Pang et al., 2020)
+        └── utt.py                       # UTT attack our method 
+    ├── data                         
+        ├── CIFAR.py                     # CIFAR10 dataset class
+        ├── GTSRB.py                     # GTSRB dataset class 
+        ├── ImageNet.py                  # ImageNet dataset class
+        ├── PASCAL.py                    # PASCAL dataset class              
+        ├── data_builder.py              # unified dataset class
+        └── data_utils.py                # dataset building helper functions
+    ├── images           
+        └── demo.pny                     #  readme file demo image
+    └── requirements.txt                 # environment setup file 
+
+
+### Setup 
+
+We conduct all our experiments using Python 3.10. We execute our program on Red
+Hat Enterprise Linux Server 7.9 (Maipo) and use NVIDIA V100 GPU with cuda version 12.3.
+
+The environment setup for Learn_to_Abstain is listed in requirements.txt. To install, run: 
+
+```sh
+python -m venv utt_attack
+source ./utt_attack/bin/activate
+pip install -r ./requirements.txt
+```
+
+### Execution
+
+Experiment default configuration can be found at
+`experiment_configuration.yml`. Argument can also be modified through
+the command line:
+
+```sh
+python run_attack.py 
+[--method]
+[--dataset]
+[--network]
+[--inject_ratio]
+[--budget]
+[--surrogate]
+[--surrogate_ckpt]
+[--xi]
+[--gpus]
+[--savedir]
+[--logdir]
+[--seed]
+```
+
+* Support methods are {badnet, sig, reg, warp, imc, utt}. 
+* Support datasets are {cifar10, gtsrb, imagenet}.
+* Support networks are {resnet18, resnet34, vgg16, vgg19, densenet121, inceptionv3}. 
+
+For example, to test BadNet attack with CIFAR10, 10\% injection ratio, trigger size 5 and ResNet18 as victime network, run following command: 
+```sh
+python run_attack.py --method badnet --dataset cifar10 --network resnet18 --budget 5 --inject_ratio 0.2
+```
+
+Another example, to test UTT with GTSRB with 1\% injection ratio, trigger size 2, ResNet18 as surrogate model,  VGG16 as victim network and upscale the attack strength during testing time by factor $\xi=2$, run following command: 
+```sh
+# Step I: train clean surrogate model (surrogate network can be different from victim)
+python run_attack.py --dataset gtsrb --network resnet18 --inject_ratio 0 --budget 0 --ckptdir ./clean_models
+# Step II: attack and test performance  
+python run_attack.py --method utt --dataset gtsrb --network vgg16 --inject_ratio 0.01 --budget 2 --surrogate resnet18 --xi 2 --surrogate_ckpt ./clean_models/gtsrb_resnet18_badnet_77_True_True_False_False_240524155809.pth
+```
+
+Results will be saved to `./result` if not specified. 
+
+Note: method `ref` and `utt` requires surrogate model as input.
+
+
+## Citations
+
+If you find this code useful in your research please cite:
+
+```
+@article{learn_to_abstain,
+  title={On the Existence of a Trojaned Twin Model},
+  author={Songzhu Zheng, Yikai Zhang, Lu Pang, Weimin Lyu, Mayank Goswami, Anderson Schneider, Yuriy Nevmyvaka, Haibin Ling, Chao Chen},
+  journal={ICLR2023 Workshop BANDS},
+  year={2023}
+}
+```
+
+## License
+
+All source files in this repository, unless explicitly mentioned
+otherwise, are released under the Apache 2.0 license, the text of
+which can be found in the LICENSE file.
+
+
+## Contact
+
+author: [[email protected]](mailto:[email protected])
+
+Morgan Stanley Machine Learning Research: [[email protected]](mailto:[email protected])
diff --git a/papers/Existence_Trojaned_Twin_Model_UTTAttack/attacker/attacker.py b/papers/Existence_Trojaned_Twin_Model_UTTAttack/attacker/attacker.py
@@ -0,0 +1,115 @@
+from typing import Dict, Tuple
+import os
+
+import torch
+import numpy as np
+from PIL import Image
+import pickle as pkl
+
+class Attacker():
+    def __init__(self,
+                 config: Dict) -> None:
+
+        self.budget = config['args']['budget'] if config['args']['budget'] else config['attack']['BUDGET']
+        self.troj_fraction = config['args']['inject_ratio'] if config['args']['inject_ratio'] else config['attack']['INJECT_RATIO']
+        self.target_source_pair = config['attack']['SOURCE_TARGET_PAIR']
+        self.lamda = config['attack']['LAMBDA'] # transparency 
+        self.config = config
+
+        self.argsdataset = self.config['args']['dataset']
+        self.argsnetwork = self.config['args']['network']
+        self.argsmethod  = self.config['args']['method']
+        self.argsseed = self.config['args']['seed']
+
+        self.dynamic = False
+
+        self.use_clip = self.config['train']['USE_CLIP']
+        self.use_transform = self.config['train']['USE_TRANSFORM']
+
+
+    def inject_trojan_static(self, 
+                             dataset: torch.utils.data.Dataset, 
+                             xi: float = 1, 
+                             mode='train', 
+                             **kwargs) -> None:
+
+        # we can only add trigger on image before transformation
+        dataset.use_transform = False
+        if mode=='train':
+            poison_rate = self.troj_fraction
+        else:
+            poison_rate = 1
+
+        if not hasattr(self, 'trigger'):
+            self._generate_trigger()
+
+        dataloader = torch.utils.data.DataLoader(dataset, batch_size=1)
+
+        imgs_troj, labels_clean, labels_troj = [], [], []
+
+        for s in self.target_source_pair:
+
+            count = 0
+            for b, (ind, img, labels_c, _) in enumerate(dataloader):
+
+                if int(labels_c) == s:
+                    if count < int(poison_rate*len(dataset)//self.config['dataset'][self.argsdataset]['NUM_CLASSES']):
+                        img_troj = self._add_trigger(img.squeeze().permute(1,2,0).numpy(), label=s, xi=xi)
+
+                        if self.use_clip:
+                            img_troj = np.clip(img_troj, 0, 1)
+
+                        if len(img_troj.shape)!=4:
+                            img_troj = np.expand_dims(img_troj, axis=0)
+
+                        imgs_troj.append(img_troj)
+                        labels_clean.append(int(labels_c))
+                        labels_troj.append(self.target_source_pair[int(labels_c)])
+                        count += 1
+
+        imgs_troj = [Image.fromarray(np.uint8(imgs_troj[i].squeeze()*255)) for i in range(len(imgs_troj))]
+        labels_clean = np.array(labels_clean)
+        labels_troj  = np.array(labels_troj)
+
+        print(f"Clean Data Num {len(dataset)}")
+        print(f"Troj  Data Num {len(imgs_troj)}")
+
+        dataset.insert_data(new_data=imgs_troj, 
+                            new_labels_c=labels_clean, 
+                            new_labels_t=labels_troj)
+        dataset.use_transform = self.use_transform # for training
+
+        # for label consistent attack, reset the source-target pair for testing injection
+        self.target_source_pair = self.config['attack']['SOURCE_TARGET_PAIR']
+        for s, t in self.target_source_pair.items():
+            if t in self.trigger:
+                self.trigger[s] = self.trigger[t]
+
+
+    def inject_trojan_dynamic(self, 
+                              img: torch.tensor, 
+                              imgs_ind, 
+                              **kwargs) -> Tuple[torch.tensor, torch.tensor, torch.tensor]:
+        raise NotImplementedError
+
+
+    def _generate_trigger(self) -> np.ndarray:
+        raise NotImplementedError
+
+
+    def _add_trigger(self) -> np.ndarray:
+        raise NotImplementedError
+
+
+    def save_trigger(self, path: str) -> None:
+        os.makedirs(path, exist_ok=True)
+        if hasattr(self, 'trigger'):
+            for k in self.trigger:
+                if len(self.trigger[k]):
+                    trigger_file = f"{self.argsdataset}_{self.argsnetwork}_{self.argsmethod}_source{k}_size{self.budget}_seed{self.argsseed}.pkl"
+                    with open(os.path.join(path, trigger_file), 'wb') as f:
+                        pkl.dump(self.trigger, f)
+                    f.close()
+        else:
+             raise AttributeError("Triggers haven't been generated !")
+
diff --git a/papers/Existence_Trojaned_Twin_Model_UTTAttack/attacker/badnet.py b/papers/Existence_Trojaned_Twin_Model_UTTAttack/attacker/badnet.py
@@ -0,0 +1,47 @@
+from collections import defaultdict
+
+import numpy as np
+
+from .attacker import Attacker
+
+class BadNet(Attacker):
+
+    def __init__(self, **kwargs) -> None:
+        super().__init__(**kwargs)
+
+        self.trigger_w = int(self.config['attack']['badnet']['TRIGGER_SHAPE'])
+
+    def _add_trigger(self, 
+                     img: np.ndarray, 
+                     label: int, 
+                     xi: float, 
+                     **kwargs) -> np.ndarray:
+
+        pos = np.random.choice(['topleft', 'topright', 'bottomleft', 'bottomright'], 1, replace=False)
+
+        trigger_w = min(self.trigger_w, min(img.shape[0], img.shape[1]))
+        if pos=='topleft':
+            h_s, h_e = 0, trigger_w
+            w_s, w_e = 0, trigger_w
+        elif pos=='topright':
+            h_s, h_e = img.shape[0]-trigger_w, img.shape[0]
+            w_s, w_e = 0, trigger_w
+        elif pos=='bottomleft':
+            h_s, h_e = 0, trigger_w
+            w_s, w_e = img.shape[1]-trigger_w, img.shape[1]
+        else: # pos='bottomright'
+            h_s, h_e = img.shape[0]-trigger_w, img.shape[0]
+            w_s, w_e = img.shape[1]-trigger_w, img.shape[1]
+
+        self.content = np.zeros(img.shape, dtype=np.float32)
+        self.content[h_s:h_e, w_s:w_e] = self.trigger[label]
+
+        return (1-self.lamda)*img + self.lamda*xi*self.content
+
+    def _generate_trigger(self) -> None:
+        # random pattern trigger
+        self.trigger = defaultdict(np.ndarray)
+        for k in self.config['attack']['SOURCE_TARGET_PAIR']:
+            self.trigger[k] = np.random.uniform(0, 1, 3*self.trigger_w**2).reshape(self.trigger_w, self.trigger_w, 3)
+            self.trigger[k] *= self.budget/(np.linalg.norm(self.trigger[k].reshape(3, -1), ord='fro')+1e-4) #L2 norm constrain
+