This repository contains the source code of the paper CryptoTL: Private, efficient and secure transfer learning [1].
[1] https://arxiv.org/pdf/2205.11935.pdf
The required python3 libraries can be found in requirements.txt
. Use the following command to install them all.
pip3 install -r requirements.txt
SEAL (version 4.0.0) is included as submodule in this repository. Use the following commands (from the root directory) to pull and build SEAL.
git submodule update --init --recursive
cd cpp/SEAL
mkdir build
cd build
cmake .. # replace with the following to use Clang: CC=clang CXX=clang++ cmake ..
make -j 6
cd ../../../
If you want to enable Intel HEXL (speedup if processors have the Intel AVX512-IFMA52 instruction set) replace the cmake
command with the following command:
cmake -DSEAL_USE_INTEL_HEXL=On .. # replace with the following to use Clang: CC=clang CXX=clang++ cmake -DSEAL_USE_INTEL_HEXL=On ..
The (preprocessed) datasets we used in our experiments are attached to Github releases (e.g., here), please download and extract them into the datasets
folder.
Dataset |
---|
IMDB-train-SBERT.csv |
IMDB-test-SBERT.csv |
SBERT-youtube-combined.csv |
Twitter-train-SBERT.csv |
Twitter-test-SBERT.csv |
The folder weights
contains models which have been trained to produce Table 3 in the paper.
Weights |
---|
cryptotl-imdb-to-imdb.bin |
cryptotl-imdb-to-twitter.bin |
cryptotl-imdb-to-yelp.bin |
cryptotl-imdb-to-youtube.bin |
cryptotl-twitter-to-imdb.bin |
cryptotl-twitter-to-twitter.bin |
cryptotl-twitter-to-yelp.bin |
cryptotl-twitter-to-youtube.bin |
cryptotl-youtube-to-imdb.bin |
cryptotl-youtube-to-twitter.bin |
cryptotl-youtube-to-yelp.bin |
cryptotl-youtube-to-youtube.bin |
----------------------------------- |
cnn-full-imdb.bin |
cnn-full-twitter.bin |
cnn-full-youtube.bin |
----------------------------------- |
cnn-upper-imdb.bin |
cnn-upper-twitter.bin |
cnn-upper-youtube.bin |
train_cryptotl.py
:
Option | Short | Value | Information | Default |
---|---|---|---|---|
Epochs | -e | integer | Number of epochs to train | 300 |
Splits | -s | integer | Splits for the K-fold Cross Validator | 10 |
Source Dataset Train | -d | Path | Input path of the used source training dataset | "" |
Target Dataset Train | -t | Path | Input path of the used target training dataset | "" |
Source Dataset Test | -k | Path | Input path of the used source test dataset | "" |
Target Dataset Test | -m | Path | Input path of the used target test dataset | "" |
Lower Model | -l | Path | Output path for the lower model | "lower_model" |
Upper Model | -u | Path | Output path for the upper model | "upper_model" |
Accurate HE | -a | Flag | Optional: If present, use more accurate HE parameters | false |
Example:
python3 ./train_cryptotl.py -e 300 -s 10 -d ./datasets/IMDB-train-SBERT.csv -t ./datasets/Twitter-train-SBERT.csv -k ./datasets/IMDB-test-SBERT.csv -m ./datasets/Twitter-test-SBERT.csv
classify_cryptotl.py
Option | Short | Value | Information | Default |
---|---|---|---|---|
Dataset | -d | Path | Input path of the dataset to classify | "" |
Model | -m | Path | Input path for the full model | "" |
Lower Model | -l | Path | Input path for the lower model, if -m not specified | "lower_model" |
Upper Model | -u | Path | Input path for the upper model, if -m not specified | "upper_model" |
Accurate HE | -a | Flag | Optional: If present, use more accurate HE parameters | false |
Example:
python3 ./classify_cryptotl.py -d ./datasets/Twitter-test-SBERT.csv
python/tl_plain.py
Option | Short | Value | Information | Default |
---|---|---|---|---|
Epochs | -e | integer | Number of epochs to train | 300 |
Splits | -s | integer | Splits for the K-fold Cross Validator | 10 |
Source Dataset | -d | Path | Input path of the used source training dataset | "" |
Target Dataset Train | -t | Path | Input path of the used target training dataset | "" |
Target Dataset Test | -l | Path | Input path of the used target test dataset | "" |
Model | -m | Path | Input/Output path for the full model | "" |
Classify | -c | Flag | Optional: If present, load model instead of training | false |
Example:
python3 python/tl_plain.py -e 300 -s 10 -d ./datasets/IMDB-train-SBERT.csv -t ./datasets/Twitter-train-SBERT.csv -l ./datasets/Twitter-test-SBERT.csv # training
python3 python/tl_plain.py -l ./datasets/Twitter-test-SBERT.csv -m model -c # classify
To simplify building (without Intel HEXL) we have prepared a docker file, which installs all dependencies, builds SEAL and downloads the preprocessed datasets. To use it, execute the following commands:
docker build -t cryptotl .
docker run -it cryptotl
Then proceed with the CryptoTL commands stated above. For using Intel HEXL, modify Dockerfile
accordingly.
Please use the following BibTeX entry to cite our work in academic papers.
@article{DBLP:journals/corr/abs-2205-11935,
author = {Roman Walch and
Samuel Sousa and
Lukas Helminger and
Stefanie N. Lindstaedt and
Christian Rechberger and
Andreas Tr{\"{u}}gler},
title = {CryptoTL: Private, efficient and secure transfer learning},
journal = {CoRR},
volume = {abs/2205.11935},
year = {2022}
}