Final period project of the course Data Protection & Privacy: an implementation of the KB-anonymization technique, a framework useful for anonymizing data for testing purpose.
To run the project is sufficient to clone or download this repository, with the command:
git clone https://github.com/A-725-K/kb-Anonimity-Data-Protection-and-Privacy.git
Our project relies on Z3 solver, if you don't have it installed, please refer to their main page.
You have only to run this simple command from your terminal:
python3 main.py [-h] -i INPUT_FILE -o OUTPUT_FILE -a ALGORITHM -k K -c CONFIG_FILE
where:
- -i: choose a dataset in json format as input
- -o: choose an output file, it will be in json format
- -a: choose the technique to apply to enhance the anonymization of data
- P-F: same Path, no Field repeat
- P-T: same Path, no Tuple repeat
- -k: the degree of anonymization you would apply on data
- -c: a configuration file that contains the range constraints to apply over the fields of tuples in dataset
Otherwise you can simply launch the test_runner utility:
cd utilities
./test_runner
- datasets: it contains all the data used in our experiments, and a bash script to gather them through an open API
- kb_anonymity: the core of the program, it contains the library proposed by us
- mappings: each file contains a map that represents some values transformed in integer
- main.py: the entry point of the program, the users would like to modify it depending on their needs
- p_test.py: the SUT, the user have to encode its program like this
- stat: contains graphics of the results produced by the test runner
- utilities
- configs.txt: an example of configuration file, it must follow a specific syntax
- json_reader.py: a utility to parse the dataset, the user should modify it depending on their data
- draw_graphics.py: a script that plot the results of the algorithms executed in batch
- test_runner.sh: a simple script to perform some experiments with different parameters to understand the behavior of the algorithm
1. p_test format
p_test must contains a function called P_Test which simulates the behaviour of the system we want to test. It takesa raw tuple and a list of constraints as input(initially empty). A constraint is a triple (field, operation symbol, value).
2. configs format
In this file the user specify the range constraints for each field of a tuple. The first row must contain all the fields present in the dataset as strings. Then each row must follow this syntax: if the constraints are related to a single field:
field:(([op_symbol value]+),?)+
otherwise, if the constraints involve two related fields:
#field1 op_symbol field2
The comma symbol separates the conditions to be put in OR, while the whitespaces are for conditions in AND.
- Andrea Canepa - Computer Science, UNIGE - Data Protection and Privacy a.y. 2019/2020
- Alessio Ravera - Computer Science, UNIGE - Data Protection and Privacy a.y. 2019/2020