update synthetic dataset example

interpretml · Jan 9, 2024 · 9ad4cb4 · 9ad4cb4
1 parent 74f698e
commit 9ad4cb4
Show file tree

Hide file tree

Showing 24 changed files with 278 additions and 260 deletions.
diff --git a/_sources/python/examples/interpretable-regression-synthetic.ipynb b/_sources/python/examples/interpretable-regression-synthetic.ipynb
@@ -11,7 +11,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this demonstration notebook, we are going to create an Explainable Boosting Machine (EBM) using a specially designed synthetic dataset. Our control over the data generation process allows us to visually assess how well the EBM is able to recover the original functions that were used to create the data.\n",
+    "In this demonstration notebook, we are going to create an Explainable Boosting Machine (EBM) using a specially designed synthetic dataset. Our control over the data generation process allows us to visually assess how well the EBM is able to recover the original functions that were used to create the data. To understand how the synthetic dataset was generated, you can examine the full code on GitHub. This will provide insights into the underlying functions we are trying to recover. The full dataset generation code can be found in: [**_synthetic generation code_**](https://github.com/interpretml/interpret/blob/develop/python/interpret-core/interpret/utils/_synthetic.py)\n",
     "\n",
     "This notebook can be found in our [**_examples folder_**](https://github.com/interpretml/interpret/tree/develop/docs/interpret/python/examples) on GitHub."
    ]
@@ -39,7 +39,7 @@
     "\n",
     "import numpy as np\n",
     "from sklearn.model_selection import train_test_split\n",
-    "from interpret.utils import synthetic_default\n",
+    "from interpret.utils import make_synthetic\n",
     "from interpret import show\n",
     "\n",
     "from interpret import set_visualize_provider\n",
@@ -48,7 +48,7 @@
     "\n",
     "seed = 42\n",
     "\n",
-    "X, y, names, types = synthetic_default(classes=None, n_samples=50000, missing=False, seed=seed)\n",
+    "X, y, names, types = make_synthetic(classes=None, n_samples=50000, missing=False, seed=seed)\n",
     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)"
    ]
   },
@@ -74,7 +74,7 @@
    "source": [
     "from interpret.glassbox import ExplainableBoostingRegressor\n",
     "\n",
-    "ebm = ExplainableBoostingRegressor(names, types, interactions=3, smoothing_rounds=3000, greediness=0.95)\n",
+    "ebm = ExplainableBoostingRegressor(names, types, interactions=3, smoothing_rounds=2000, greediness=0.95)\n",
     "ebm.fit(X_train, y_train)"
    ]
   },
@@ -98,7 +98,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Feature 0 - cosine partial response generated on uniformly distributed data.\n",
+    "# Feature 0 - Cosine partial response generated on uniformly distributed data.\n",
     "\n",
     "show(ebm.explain_global(), 0)"
    ]
@@ -109,7 +109,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Feature 1 - sine partial response generated on normally distributed data.\n",
+    "# Feature 1 - Sine partial response generated on normally distributed data.\n",
     "\n",
     "show(ebm.explain_global(), 1)"
    ]
@@ -131,7 +131,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Feature 3 - Linear partial response generated on poisson distributed data.\n",
+    "# Feature 3 - Linear partial response generated on poisson distributed integers.\n",
     "\n",
     "show(ebm.explain_global(), 3)"
    ]
@@ -142,7 +142,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Feature 4 - Linear partial response generated on a feature with correlations \n",
+    "# Feature 4 - Square wave partial response generated on a feature with correlations\n",
     "#             to features 0 and 1 with added normally distributed noise.\n",
     "\n",
     "show(ebm.explain_global(), 4)"
@@ -154,7 +154,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Feature 5 - Cubed partial response generated on a feature with a conditional \n",
+    "# Feature 5 - Sawtooth wave partial response generated on a feature with a conditional \n",
     "#             correlation to feature 2 with added normally distributed noise.\n",
     "\n",
     "show(ebm.explain_global(), 5)"
@@ -178,7 +178,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Feature 7 - Unused in the generation function, so has minimal importance.\n",
+    "# Feature 7 - Unused in the generation function. Should have minimal importance.\n",
     "\n",
     "show(ebm.explain_global(), 7)"
    ]
@@ -236,8 +236,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Interaction 2 - Extra pairwise interaction that has low importance because it \n",
-    "#                 isn't explicitly included in the generation function.\n",
+    "# Interaction 2 - Extra pairwise interaction. Should have minimal importance\n",
+    "#                 since it isn't explicitly included in the generation function.\n",
     "\n",
     "show(ebm.explain_global(), 12)"
    ]
@@ -246,7 +246,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<h2>Importances of the features and pairwise terms</h2>"
+    "<h2>For RMSE regression, the EBM's intercept should be close to the mean</h2>"
    ]
   },
   {
@@ -255,14 +255,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "show(ebm.explain_global())"
+    "print(np.average(y))\n",
+    "print(ebm.intercept_)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<h2>Evaluate EBM performance</h2>"
+    "<h2>Importances of the features and pairwise terms</h2>"
    ]
   },
   {
@@ -271,19 +272,26 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from interpret.perf import RegressionPerf\n",
-    "\n",
-    "ebm_perf = RegressionPerf(ebm).explain_perf(X_test, y_test, name='EBM')\n",
-    "show(ebm_perf)"
+    "show(ebm.explain_global())"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To understand how the synthetic dataset was generated, you can examine the full code on GitHub. This will provide insights into the underlying functions we are trying to recover. The full dataset generation code can be found in:\n",
+    "<h2>Evaluate EBM performance</h2>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from interpret.perf import RegressionPerf\n",
     "\n",
-    "https://github.com/interpretml/interpret/blob/develop/python/interpret-core/interpret/utils/_synthetic.py"
+    "ebm_perf = RegressionPerf(ebm).explain_perf(X_test, y_test, name='EBM')\n",
+    "show(ebm_perf)"
    ]
   }
  ],

diff --git a/dpebm.html b/dpebm.html
diff --git a/dr.html b/dr.html
diff --git a/dt.html b/dt.html
diff --git a/ebm-internals-classification.html b/ebm-internals-classification.html
diff --git a/ebm-internals-multiclass.html b/ebm-internals-multiclass.html
diff --git a/ebm-internals-regression.html b/ebm-internals-regression.html
diff --git a/ebm.html b/ebm.html
diff --git a/framework.html b/framework.html
diff --git a/index.html b/index.html
diff --git a/lime.html b/lime.html
diff --git a/lr.html b/lr.html
diff --git a/msa.html b/msa.html
diff --git a/pdp.html b/pdp.html
diff --git a/python/examples/differential-privacy.html b/python/examples/differential-privacy.html
diff --git a/python/examples/explain-blackbox-classifiers.html b/python/examples/explain-blackbox-classifiers.html
diff --git a/python/examples/explain-blackbox-regressors.html b/python/examples/explain-blackbox-regressors.html
diff --git a/python/examples/group-importances.html b/python/examples/group-importances.html
diff --git a/python/examples/interpretable-classification.html b/python/examples/interpretable-classification.html