Skip to content

Py 2: Fitting the Transformer

Joshua Levy edited this page Dec 4, 2019 · 2 revisions

Now that we have split our data into training and testing, we fit the transformer using these commands:

transformer=InteractionTransformer(max_train_test_samples=1000,mode_interaction_extract=int(np.sqrt(X_train.shape[1]))) 
transformer.fit(X_train,y_train)

Where max_train_test_samples decides how many samples to use to train and evaluate the SHAP model, mode_interaction_extract indicates how many top interactions to select, in this case it is the square root of the number of features; setting this option to 'sqrt' will automatically extract that number of interactions; selecting "knee" performs some automated selection based on how these interaction scores drop off.

This transformer object now contains key information for generating a new design matrix for our sample. Let's generate new design matrices from our old ones using this transformer:

X_train2=transformer.transform(X_train)
X_test2=transformer.transform(X_test)

These new design matrices contain the relevant new interactions that resulted from the model building approach. We can also check out the pertinent interactions extracted as such:

transformer.all_interaction_shap_scores.sort_values('shap_interaction_score',ascending=False).iloc[:10]

Now that we have transformed the data, let's see how well a logistic regression model would perform on this new design matrix as compared to the original.