Skip to content

Py 3: Evaluating the New Design Matrix and Interpreting via SHAP

Joshua Levy edited this page Dec 4, 2019 · 2 revisions

Now, we can see if the extracted interactions really help our model. Let's fit a logistic regression model without the interactions, a random forest model and a logistic regression model with the interactions included as such:

lr=LogisticRegression(random_state=42,class_weight='balanced').fit(X_train,y_train)
lr2=LogisticRegression(random_state=42,class_weight='balanced').fit(X_train2,y_train)
rf=BalancedRandomForestClassifier(random_state=42).fit(X_train,y_train)

Now what do you get when you calculate the AUROCs or C-statistics of these models? Find out here:

print(roc_auc_score(y_test,lr.predict_proba(X_test)[:,-1]))
print(roc_auc_score(y_test,lr2.predict_proba(X_test2)[:,-1]))
print(roc_auc_score(y_test,rf.predict_proba(X_test)[:,-1]))

Given that we have fit these models, we notice that the interactions help boost the logistic regression model such that it can outperform random forest. The added ability here is to have features that are inherently more interpretable than running a random forest, thus we have sensibly extracted interactions. We have wrapped some SHAP visualization scripts for you to check out:

shap_lr=run_shap(X_train, X_test, lr, model_type='linear', explainer_options={}, get_shap_values_options={}, overall=False, savefile='../test_data/epistasis.lr.shap.png')

shap_rf=run_shap(X_train, X_test, rf, model_type='tree', explainer_options={}, get_shap_values_options={}, overall=False, savefile='../test_data/epistasis.rf.shap.png')

shap_lr2=run_shap(X_train2, X_test2, lr2, model_type='linear', explainer_options={}, get_shap_values_options={}, overall=False, savefile='../test_data/epistasis.lr2.shap.png')

That is our quick demo for extending the model building capacity of your traditional modeling approach through the use of the InteractionTransformer. We hope that you find this tool to be helpful. We are always open to adding more functionality to the workflow, so if you have something in mind, please let us know in the Issues section. Thanks!