Update MIA README to include epsilon lower bound.

PiperOrigin-RevId: 689433589
tensorflow · Oct 24, 2024 · 4d5c77c · 4d5c77c
1 parent d965556
commit 4d5c77c
Show file tree

Hide file tree

Showing 2 changed files with 471 additions and 459 deletions.
diff --git a/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/README.md b/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/README.md
@@ -10,7 +10,8 @@ memorization is present and thus the less privacy-preserving the model is.
 
 The privacy vulnerability (or memorization potential) is measured via the area
 under the ROC-curve (`auc`) or via max{|fpr - tpr|} (`advantage`) of the attack
-classifier. These measures are very closely related.
+classifier. These measures are very closely related. We can also obtain a lower
+bound for the differential privacy epsilon.
 
 The tests provided by the library are "black box". That is, only the outputs of
 the model are used (e.g., losses, logits, predictions). Neither model internals
@@ -38,9 +39,8 @@ from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_s
 # loss_test  shape: (n_test, )
 
 attacks_result = mia.run_attacks(
-    AttackInputData(
-        loss_train = loss_train,
-        loss_test = loss_test))
+    AttackInputData(loss_train=loss_train, loss_test=loss_test)
+)
 ```
 
 This example calls `run_attacks` with the default options to run a host of
@@ -57,9 +57,11 @@ Then, we can view the attack results by:
 ```python
 print(attacks_result.summary())
 # Example output:
-# -> Best-performing attacks over all slices
-#      THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an AUC of 0.59 on slice Entire dataset
-#      THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an advantage of 0.20 on slice Entire dataset
+# Best-performing attacks over all slices
+#   LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an AUC of 0.72 on slice CORRECTLY_CLASSIFIED=False
+#   LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an advantage of 0.34 on slice CORRECTLY_CLASSIFIED=False
+#   LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved a positive predictive value of 1.00 on slice CLASS=0
+#   THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved top-5 epsilon lower bounds of 4.6254, 4.6121, 4.5986, 4.5850, 4.5711 on slice Entire dataset
 ```
 
 ### Other codelabs
@@ -100,16 +102,17 @@ First, similar as before, we specify the input for the attack as an
 # loss_test  shape: (n_test, )
 
 attack_input = AttackInputData(
-    logits_train = logits_train,
-    logits_test = logits_test,
-    loss_train = loss_train,
-    loss_test = loss_test,
-    labels_train = labels_train,
-    labels_test = labels_test)
+    logits_train=logits_train,
+    logits_test=logits_test,
+    loss_train=loss_train,
+    loss_test=loss_test,
+    labels_train=labels_train,
+    labels_test=labels_test,
+)
 ```
 
 Instead of `logits`, you can also specify `probs_train` and `probs_test` as the
-predicted probabilty vectors of each example.
+predicted probability vectors of each example.
 
 Then, we specify some details of the attack. The first part includes the
 specifications of the slicing of the data. For example, we may want to evaluate
@@ -118,91 +121,107 @@ the model's classification. These can be specified by a `SlicingSpec` object.
 
 ```python
 slicing_spec = SlicingSpec(
-    entire_dataset = True,
-    by_class = True,
-    by_percentiles = False,
-    by_classification_correctness = True)
+    entire_dataset=True,
+    by_class=True,
+    by_percentiles=False,
+    by_classification_correctness=True,
+)
 ```
 
 The second part specifies the classifiers for the attacker to use. Currently,
 our API supports five classifiers, including `AttackType.THRESHOLD_ATTACK` for
 simple threshold attack, `AttackType.LOGISTIC_REGRESSION`,
 `AttackType.MULTI_LAYERED_PERCEPTRON`, `AttackType.RANDOM_FOREST`, and
 `AttackType.K_NEAREST_NEIGHBORS` which use the corresponding machine learning
-models. For some model, different classifiers can yield pertty different
-results. We can put multiple classifers in a list:
+models. For some model, different classifiers can yield pretty different
+results. We can put multiple classifiers in a list:
 
 ```python
 attack_types = [
     AttackType.THRESHOLD_ATTACK,
-    AttackType.LOGISTIC_REGRESSION
+    AttackType.LOGISTIC_REGRESSION,
 ]
 ```
 
 Now, we can call the `run_attacks` methods with all specifications:
 
 ```python
-attacks_result = mia.run_attacks(attack_input=attack_input,
-                                 slicing_spec=slicing_spec,
-                                 attack_types=attack_types)
+attacks_result = mia.run_attacks(
+    attack_input=attack_input,
+    slicing_spec=slicing_spec,
+    attack_types=attack_types,
+)
 ```
 
 This returns an object of type `AttackResults`. We can, for example, use the
-following code to see the attack results specificed per-slice, as we have
-request attacks by class and by model's classification correctness.
+following code to see the attack results specified per-slice, as we have request
+attacks by class and by model's classification correctness.
 
 ```python
 print(attacks_result.summary(by_slices = True))
 # Example output:
-# ->  Best-performing attacks over all slices
-#       THRESHOLD_ATTACK achieved an AUC of 0.75 on slice CORRECTLY_CLASSIFIED=False
-#       THRESHOLD_ATTACK achieved an advantage of 0.38 on slice CORRECTLY_CLASSIFIED=False
-#
-#     Best-performing attacks over slice: "Entire dataset"
-#       LOGISTIC_REGRESSION achieved an AUC of 0.61
-#       THRESHOLD_ATTACK achieved an advantage of 0.22
-#
-#     Best-performing attacks over slice: "CLASS=0"
-#       LOGISTIC_REGRESSION achieved an AUC of 0.62
-#       LOGISTIC_REGRESSION achieved an advantage of 0.24
-#
-#     Best-performing attacks over slice: "CLASS=1"
-#       LOGISTIC_REGRESSION achieved an AUC of 0.61
-#       LOGISTIC_REGRESSION achieved an advantage of 0.19
-#
-#     ...
-#
-#     Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=True"
-#       LOGISTIC_REGRESSION achieved an AUC of 0.53
-#       THRESHOLD_ATTACK achieved an advantage of 0.05
-#
-#     Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=False"
-#       THRESHOLD_ATTACK achieved an AUC of 0.75
-#       THRESHOLD_ATTACK achieved an advantage of 0.38
+# Best-performing attacks over all slices
+#   LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an AUC of 0.72 on slice CORRECTLY_CLASSIFIED=False
+#   LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an advantage of 0.34 on slice CORRECTLY_CLASSIFIED=False
+#   LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved a positive predictive value of 1.00 on slice CLASS=0
+#   THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved top-5 epsilon lower bounds of 4.6254, 4.6121, 4.5986, 4.5850, 4.5711 on slice Entire dataset
+
+# Best-performing attacks over slice: "Entire dataset"
+#   LOGISTIC_REGRESSION (with 50000 training and 10000 test examples) achieved an AUC of 0.58
+#   LOGISTIC_REGRESSION (with 50000 training and 10000 test examples) achieved an advantage of 0.17
+#   THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved a positive predictive value of 0.86
+#   THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved top-5 epsilon lower bounds of 4.6254, 4.6121, 4.5986, 4.5850, 4.5711
+
+# Best-performing attacks over slice: "CLASS=0"
+#   LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved an AUC of 0.63
+#   LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved an advantage of 0.19
+#   LOGISTIC_REGRESSION (with 5000 training and 1000 test examples) achieved a positive predictive value of 1.00
+#   THRESHOLD_ATTACK (with 5000 training and 1000 test examples) achieved top-5 epsilon lower bounds of 4.1920, 4.1645, 4.1364, 4.1074, 4.0775
+
+# ...
+
+# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=True"
+#   LOGISTIC_REGRESSION (with 42959 training and 6844 test examples) achieved an AUC of 0.51
+#   LOGISTIC_REGRESSION (with 42959 training and 6844 test examples) achieved an advantage of 0.05
+#   LOGISTIC_REGRESSION (with 42959 training and 6844 test examples) achieved a positive predictive value of 0.94
+#   THRESHOLD_ATTACK (with 42959 training and 6844 test examples) achieved top-5 epsilon lower bounds of 0.9495, 0.6358, 0.5630, 0.4536, 0.4341
+
+# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=False"
+#   LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an AUC of 0.72
+#   LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved an advantage of 0.34
+#   LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved a positive predictive value of 0.97
+#   LOGISTIC_REGRESSION (with 7041 training and 3156 test examples) achieved top-5 epsilon lower bounds of 3.8844, 3.8678, 3.8510, 3.8339, 3.8165
 ```
 
 #### Viewing and plotting the attack results
 
 We have seen an example of using `summary()` to view the attack results as text.
 We also provide some other ways for inspecting the attack results.
 
-To get the attack that achieves the maximum attacker advantage or AUC, we can do
+To get the attack that achieves the maximum attacker advantage, AUC, or epsilon
+lower bound, we can do
 
 ```python
 max_auc_attacker = attacks_result.get_result_with_max_auc()
 max_advantage_attacker = attacks_result.get_result_with_max_attacker_advantage()
+max_epsilon_attacker = attacks_result.get_result_with_max_epsilon()
 ```
 
 Then, for individual attack, such as `max_auc_attacker`, we can check its type,
-attacker advantage and AUC by
+attacker advantage, AUC, and epsilon lower bound by
 
 ```python
-print("Attack type with max AUC: %s, AUC of %.2f, Attacker advantage of %.2f" %
-      (max_auc_attacker.attack_type,
-       max_auc_attacker.roc_curve.get_auc(),
-       max_auc_attacker.roc_curve.get_attacker_advantage()))
+print(
+    "Attack type with max AUC: %s, AUC of %.2f, Attacker advantage of %.2f, Epsilon lower bound of %s"
+    % (
+        max_auc_attacker.attack_type,
+        max_auc_attacker.roc_curve.get_auc(),
+        max_auc_attacker.roc_curve.get_attacker_advantage(),
+        max_auc_attacker.get_epsilon_lower_bound()
+    )
+)
 # Example output:
-# -> Attack type with max AUC: THRESHOLD_ATTACK, AUC of 0.75, Attacker advantage of 0.38
+# Attack type with max AUC: LOGISTIC_REGRESSION, AUC of 0.72, Attacker advantage of 0.34, Epsilon lower bound of [3.88435257 3.86781797 3.85100545 3.83390548 3.81650809]
 ```
 
 We can also plot its ROC curve by
@@ -217,24 +236,24 @@ which would give a figure like the one below
 ![roc_fig](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelab_roc_fig.png?raw=true)
 
 Additionally, we provide functionality to convert the attack results into Pandas
-data frame:
+dataframe:
 
 ```python
 import pandas as pd
 
 pd.set_option("display.max_rows", 8, "display.max_columns", None)
 print(attacks_result.calculate_pd_dataframe())
 # Example output:
-#           slice feature slice value attack type  Attacker advantage       AUC
-# 0        entire_dataset               threshold            0.216440  0.600630
-# 1        entire_dataset                      lr            0.212073  0.612989
-# 2                 class           0   threshold            0.226000  0.611669
-# 3                 class           0          lr            0.239452  0.624076
-# ..                  ...         ...         ...                 ...       ...
-# 22  correctly_classfied        True   threshold            0.054907  0.471290
-# 23  correctly_classfied        True          lr            0.046986  0.525194
-# 24  correctly_classfied       False   threshold            0.379465  0.748138
-# 25  correctly_classfied       False          lr            0.370713  0.737148
+#            slice feature slice value  train size  test size          attack type  Attacker advantage  Positive predictive value       AUC  Epsilon lower bound_1  Epsilon lower bound_2  Epsilon lower bound_3  Epsilon lower bound_4  Epsilon lower bound_5
+# 0         Entire dataset                   50000      10000     THRESHOLD_ATTACK            0.172520                   0.862614  0.581630               4.625393               4.612104               4.598635               4.584982               4.571140
+# 1         Entire dataset                   50000      10000  LOGISTIC_REGRESSION            0.173060                   0.862081  0.583981               4.531399               4.513775               4.511974               4.498905               4.492165
+# 2                  class           0        5000       1000     THRESHOLD_ATTACK            0.162000                   0.877551  0.580728               4.191954               4.164547               4.136368               4.107372               4.077511
+# 3                  class           0        5000       1000  LOGISTIC_REGRESSION            0.193800                   1.000000  0.627758               3.289194               3.220285               3.146292               3.118849               3.066407
+# ...
+# 22  correctly_classified        True       42959       6844     THRESHOLD_ATTACK            0.043953                   0.862643  0.474713               0.949550               0.635773               0.563032               0.453640               0.434125
+# 23  correctly_classified        True       42959       6844  LOGISTIC_REGRESSION            0.048963                   0.943218  0.505334               0.597257               0.596095               0.594016               0.592702               0.590765
+# 24  correctly_classified       False        7041       3156     THRESHOLD_ATTACK            0.326865                   0.941176  0.707597               3.818741               3.805451               3.791982               3.778329               3.764488
+# 25  correctly_classified       False        7041       3156  LOGISTIC_REGRESSION            0.336655                   0.972222  0.717386               3.884353               3.867818               3.851005               3.833905               3.816508
 ```
 
 #### Advanced Membership Inference Attacks