Skip to content

Commit

Permalink
update synthetic dataset example
Browse files Browse the repository at this point in the history
  • Loading branch information
paulbkoch committed Jan 9, 2024
1 parent 74f698e commit 9ad4cb4
Show file tree
Hide file tree
Showing 24 changed files with 278 additions and 260 deletions.
50 changes: 29 additions & 21 deletions _sources/python/examples/interpretable-regression-synthetic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this demonstration notebook, we are going to create an Explainable Boosting Machine (EBM) using a specially designed synthetic dataset. Our control over the data generation process allows us to visually assess how well the EBM is able to recover the original functions that were used to create the data.\n",
"In this demonstration notebook, we are going to create an Explainable Boosting Machine (EBM) using a specially designed synthetic dataset. Our control over the data generation process allows us to visually assess how well the EBM is able to recover the original functions that were used to create the data. To understand how the synthetic dataset was generated, you can examine the full code on GitHub. This will provide insights into the underlying functions we are trying to recover. The full dataset generation code can be found in: [**_synthetic generation code_**](https://github.com/interpretml/interpret/blob/develop/python/interpret-core/interpret/utils/_synthetic.py)\n",
"\n",
"This notebook can be found in our [**_examples folder_**](https://github.com/interpretml/interpret/tree/develop/docs/interpret/python/examples) on GitHub."
]
Expand Down Expand Up @@ -39,7 +39,7 @@
"\n",
"import numpy as np\n",
"from sklearn.model_selection import train_test_split\n",
"from interpret.utils import synthetic_default\n",
"from interpret.utils import make_synthetic\n",
"from interpret import show\n",
"\n",
"from interpret import set_visualize_provider\n",
Expand All @@ -48,7 +48,7 @@
"\n",
"seed = 42\n",
"\n",
"X, y, names, types = synthetic_default(classes=None, n_samples=50000, missing=False, seed=seed)\n",
"X, y, names, types = make_synthetic(classes=None, n_samples=50000, missing=False, seed=seed)\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)"
]
},
Expand All @@ -74,7 +74,7 @@
"source": [
"from interpret.glassbox import ExplainableBoostingRegressor\n",
"\n",
"ebm = ExplainableBoostingRegressor(names, types, interactions=3, smoothing_rounds=3000, greediness=0.95)\n",
"ebm = ExplainableBoostingRegressor(names, types, interactions=3, smoothing_rounds=2000, greediness=0.95)\n",
"ebm.fit(X_train, y_train)"
]
},
Expand All @@ -98,7 +98,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Feature 0 - cosine partial response generated on uniformly distributed data.\n",
"# Feature 0 - Cosine partial response generated on uniformly distributed data.\n",
"\n",
"show(ebm.explain_global(), 0)"
]
Expand All @@ -109,7 +109,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Feature 1 - sine partial response generated on normally distributed data.\n",
"# Feature 1 - Sine partial response generated on normally distributed data.\n",
"\n",
"show(ebm.explain_global(), 1)"
]
Expand All @@ -131,7 +131,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Feature 3 - Linear partial response generated on poisson distributed data.\n",
"# Feature 3 - Linear partial response generated on poisson distributed integers.\n",
"\n",
"show(ebm.explain_global(), 3)"
]
Expand All @@ -142,7 +142,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Feature 4 - Linear partial response generated on a feature with correlations \n",
"# Feature 4 - Square wave partial response generated on a feature with correlations\n",
"# to features 0 and 1 with added normally distributed noise.\n",
"\n",
"show(ebm.explain_global(), 4)"
Expand All @@ -154,7 +154,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Feature 5 - Cubed partial response generated on a feature with a conditional \n",
"# Feature 5 - Sawtooth wave partial response generated on a feature with a conditional \n",
"# correlation to feature 2 with added normally distributed noise.\n",
"\n",
"show(ebm.explain_global(), 5)"
Expand All @@ -178,7 +178,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Feature 7 - Unused in the generation function, so has minimal importance.\n",
"# Feature 7 - Unused in the generation function. Should have minimal importance.\n",
"\n",
"show(ebm.explain_global(), 7)"
]
Expand Down Expand Up @@ -236,8 +236,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Interaction 2 - Extra pairwise interaction that has low importance because it \n",
"# isn't explicitly included in the generation function.\n",
"# Interaction 2 - Extra pairwise interaction. Should have minimal importance\n",
"# since it isn't explicitly included in the generation function.\n",
"\n",
"show(ebm.explain_global(), 12)"
]
Expand All @@ -246,7 +246,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Importances of the features and pairwise terms</h2>"
"<h2>For RMSE regression, the EBM's intercept should be close to the mean</h2>"
]
},
{
Expand All @@ -255,14 +255,15 @@
"metadata": {},
"outputs": [],
"source": [
"show(ebm.explain_global())"
"print(np.average(y))\n",
"print(ebm.intercept_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Evaluate EBM performance</h2>"
"<h2>Importances of the features and pairwise terms</h2>"
]
},
{
Expand All @@ -271,19 +272,26 @@
"metadata": {},
"outputs": [],
"source": [
"from interpret.perf import RegressionPerf\n",
"\n",
"ebm_perf = RegressionPerf(ebm).explain_perf(X_test, y_test, name='EBM')\n",
"show(ebm_perf)"
"show(ebm.explain_global())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To understand how the synthetic dataset was generated, you can examine the full code on GitHub. This will provide insights into the underlying functions we are trying to recover. The full dataset generation code can be found in:\n",
"<h2>Evaluate EBM performance</h2>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from interpret.perf import RegressionPerf\n",
"\n",
"https://github.com/interpretml/interpret/blob/develop/python/interpret-core/interpret/utils/_synthetic.py"
"ebm_perf = RegressionPerf(ebm).explain_perf(X_test, y_test, name='EBM')\n",
"show(ebm_perf)"
]
}
],
Expand Down
14 changes: 7 additions & 7 deletions dpebm.html

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions dr.html

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions dt.html

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions ebm-internals-classification.html

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions ebm-internals-multiclass.html

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions ebm-internals-regression.html

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions ebm.html

Large diffs are not rendered by default.

16 changes: 8 additions & 8 deletions framework.html

Large diffs are not rendered by default.

16 changes: 8 additions & 8 deletions index.html

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions lime.html

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions lr.html

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions msa.html

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions pdp.html

Large diffs are not rendered by default.

16 changes: 8 additions & 8 deletions python/examples/differential-privacy.html

Large diffs are not rendered by default.

28 changes: 14 additions & 14 deletions python/examples/explain-blackbox-classifiers.html

Large diffs are not rendered by default.

28 changes: 14 additions & 14 deletions python/examples/explain-blackbox-regressors.html

Large diffs are not rendered by default.

28 changes: 14 additions & 14 deletions python/examples/group-importances.html

Large diffs are not rendered by default.

38 changes: 19 additions & 19 deletions python/examples/interpretable-classification.html

Large diffs are not rendered by default.

Loading

0 comments on commit 9ad4cb4

Please sign in to comment.