-
Notifications
You must be signed in to change notification settings - Fork 36
/
Copy pathapplication.Rmd
541 lines (375 loc) · 28.1 KB
/
application.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
# (PART\*) Part II: Praxis {-}
# Application Using R
```{r setupApplication, include=FALSE}
# knitr::opts_chunk$set(cache.rebuild=T, cache = T)
```
We're now ready to do some modeling! We've now seen the relationships of GAM to techniques we know, deficiencies with common approaches, and a little bit about how GAMs work conceptually. The next step is to dive in.
## Initial Examination
The data set has been constructed using average Science scores by country from the Programme for International Student Assessment ([PISA](https://www.oecd.org/pisa/)) 2006, along with GNI per capita (Purchasing Power Parity, 2005 dollars), Educational Index, Health Index, and Human Development Index from [UN data](http://hdr.undp.org/en/data). The key variables are as follows (variable abbreviations in bold)[^datainfo]:
[^datainfo]: The education component is measured by mean of years of schooling for adults aged 25 years and expected years of schooling for children of school entering age, the health index by life expectancy at birth, and the wealth component is based on the gross national income per capita. The HDI sets a minimum and a maximum for each dimension, and values indicate where each country stands in relation to these endpoints, expressed as a value between 0 and 1. More information on the HDI measures can be found [here](http://hdr.undp.org/en/statistics/hdi/).
- **Overall** Science Score (average score for 15 year olds)
- **Interest** in science
- Identifying scientific **Issues**
- **Explaining** phenomena scientifically
- **Support** for scientific inquiry
- **Income** Index
- **Health** Index
- **Edu**cation Index
- **H**uman **D**evelopment **I**ndex (composed of the Income index, Health Index, and Education Index)
But even before we get too far, it would be good to know our options in the world of GAMs. The [appendix][appendix] of this document has a list of some packages to be aware of, and there is quite a bit of GAM functionality available within R, even for just plotting[^geom_smooth]. We will use <span class="pack">mgcv</span> for our purposes.
[^geom_smooth]: <span class="pack">ggplot2</span> has basic gam functionality for scatterplot smoothing.
The first thing to do is get the data in and do some initial inspections.
```{r initial_inspection_of_pisa, echo=1:3}
# url for the data is:
# https://raw.githubusercontent.com/m-clark/generalized-additive-models/master/data/pisasci2006.csv
pisa = read_csv('data/pisasci2006.csv')
pisa %>%
tidyext::num_by(vars(-Country)) %>%
gt(decimals = 1)
```
<br>
<br>
```{r initial_inspection_of_pisa_visual, echo=FALSE, fig.asp=.75}
source('funcs/better_smooth.R')
source('funcs/better_corr.R')
p = GGally::ggpairs(
pisa[, -c(1, 3:5)],
lower = list(
continuous = GGally::wrap(
better_smooth,
ptalpha = .25,
ptcol = '#D55E00',
ptsize = 1,
linecol = '#03b3ff',
method = 'loess',
se = F,
lwd = .5
)
),
diag = list(continuous = GGally::wrap(
'densityDiag', color = 'gray50', lwd = .5
)),
# upper=list(continuous=GGally::wrap(better_corr)),
axisLabels = "none"
)
p
```
<br>
The scatterplot matrix has quite a bit of information to spend some time with- univariate density, loess curves, and correlations. For our purposes, we will ignore the issue regarding the haves vs. have-nots on the science scale and save that for another day. Note the surprising negative correlation between interest and overall score for science, which might remind us of [Simpson's paradox](http://en.wikipedia.org/wiki/Simpson_paradox), namely, that what occurs for the individual may not be the same for the group. One will also note that while there is a positive correlation between Income and Overall science scores, it flattens out after an initial rise. Linear fits might be difficult to swallow for some of these, but let's take a closer look.
```{r scatterwrap, echo=F, fig.asp=.6}
# get data into a form to take advantage of ggplot
dmelt =
pisa %>%
select(-c(Evidence, Explain, Issues)) %>%
pivot_longer(-c(Overall, Country), names_to = 'Index', values_to = 'Value')
# leave the smooth off for now
scatterwrap = ggplot(aes(x = Value, y = Overall), data = dmelt) +
geom_point(color = '#D55E00', alpha = .75) +
#geom_smooth(se=F) +
geom_text(
aes(label = Country),
alpha = 0,
size = 1,
angle = 30,
hjust = -.2,
# making transparent so only plotly will show the country
vjust = -.2
) +
facet_wrap( ~ Index, scales = 'free_x') +
labs(x = '')
scatterwrap
```
<br>
We can see again that linear fits aren't going to do so well for some, though it might be a close approximation for interest in science and support for science. Now let's run the smooth. By default, <span class="pack">ggplot2</span> will use a loess smoother for small data sets (i.e. < 1000 observations), but one can use the <span class="pack">mgcv</span> <span class="function">gam</span> function as a smoother by setting `method = 'gam'` in when using <span class="function">geom_smooth</span>.
```{r scatterwrapSmooth, echo=F, fig.asp=.6}
scatterwrapSmooth = ggplot(aes(x = Value, y = Overall), data = dmelt) +
geom_point(color = '#D55E00', alpha = .75) +
geom_smooth(se = FALSE, lwd = .5, color = '#56B4E9') +
geom_text(
aes(label = Country),
alpha = 0,
size = 1,
angle = 30,
hjust = -.2,
# making transparent so only plotly will show the country
vjust = -.2
) +
facet_wrap( ~ Index, scales = 'free_x') +
labs(x = '') +
theme(plot.background = element_rect(color = 'transparent'))
scatterwrapSmooth
```
<br>
Often in these situations people will perform some standard transformation, such as a log, but as we noted earlier, it doesn't help as nearly as often as it is used. For example, in this case one can log the overall score, Income, or both and a linear relation will still not be seen.
## Single Feature
### Linear Fit
We will start with the simple situation of using a single feature for prediction. Let's begin by using a typical linear regression to predict science scores by the Income index. We could use the standard R <span class="func">lm</span> function, but I'll leave that as an exercise for comparison. We can still do straightforward linear models with the <span class="func">gam</span> function, and again it is important to note that the standard linear model can be seen as a special case of a GAM.
```{r mod_lm}
library(mgcv)
mod_lm = gam(Overall ~ Income, data = pisa)
summary(mod_lm)
```
What are we getting here? The same thing you get from a regular linear model, because you just ran one. However, there are a couple things to look at. The coefficient is statistically significant, but serves as a reminder that it usually a good idea to scale feature variables so that the effect is more meaningful. Here, moving one unit on Income is akin from going broke to being the richest country. But in a more meaningful sense, if we moved from say, .7 to .8, we'd expect an increase of about 35 points on the science score. We also see the deviance explained[^devianceexplained], which serves as a generalization of R-squared, and in this case, it actually is equivalent to the unadjusted R-squared. Likewise, there is the familiar adjusted version of it to account for small sample size and model complexity. The scale estimate is the scaled deviance, which here is equivalent to the residual sums of squares. The GCV score we will save for when we run a GAM.
### GAM
Let's now try some nonlinear approaches[^woodexample], keeping in mind that $\mu=f(x)$. As a point of comparison, we can start by trying a standard polynomial regression, and it might do well enough, but let's go further. To begin we must consider a *basis* to use, a space that $f$ is an element of. Doing so leads to choosing a set of *basis functions* $F_j$, with parameters $b_j$ that will be combined to produce $f(x)$:
[^woodexample]: This example is pretty much straight from @wood_generalized_2006 with little modification.
$$f(x)=\displaystyle\sum\limits_{j=1}^q F_{j}(x)b_{j}$$
To better explain by example, if we use a cubic polynomial, the basis is: $F_1(x) = 1$, $F_2(x)=x$, $F_3(x)=x^2$, $F_4(x)=x^3$, which leads to the following:
$$f(x) = b_1 + b_2\cdot x + b_3\cdot x^2 + b_4\cdot x^3$$
The following visualization allows us to see the effects in action. It is based on the results extracted from running such a model[^polyreg] and obtaining the coefficients. The first plot represents the intercept of 470.44, the second plot, our $b_2$ coefficient of 289.5 multiplied by Income and so forth. The bottom plot shows the final fit $f(x)$, i.e. the linear combination of the basis functions.
```{r polynomialBasis, echo=F}
dpoly = bind_rows(pisa, pisa, pisa, pisa, pisa)
dpoly$basis = factor(rep(1:5, e = nrow(pisa)))
levels(dpoly$basis) = c('Intercept', 'x', "x^2", 'x^3', 'f(x)')
### polynomial with coefficients
polynomialBasis = ggplot(aes(x = Income, y = Income), data = dpoly) +
geom_line(aes(x = Income, y = 470.44), data = dpoly[1:65, ], color = "#56B4E9") + #indexing gets around basis == expression
geom_line(aes(x = Income, y = Income * 289.5),
data = dpoly[66:130, ],
color = "#56B4E9") +
geom_line(aes(x = Income, y = Income ^ 2 * (-84.6)),
data = dpoly[131:195, ],
color = "#56B4E9") +
geom_line(aes(x = Income, y = Income ^ 3 * (-113.6)),
data = dpoly[196:260, ],
color = "#56B4E9") +
ylab("") +
geom_smooth(
aes(x = Income, y = Overall),
se = FALSE,
method = 'lm',
formula = y ~ poly(x, 3),
data = dpoly[261:325, ],
color = "#56B4E9",
lwd = .5
) +
facet_grid(basis ~ ., scales = 'free_y') # , labeller=label_parsed if using expressions
polynomialBasis
```
<br>
At this point we have done nothing we couldn't do in our regular regression approach, as polynomial regression has a long history of use in modeling. However, the take home message is that as we move to GAMs we are going about things in much the same fashion; we are simply changing the nature of the basis, and have a great deal more flexibility in choosing the form.
In the next visualization I show the fit using a 'by-hand' cubic spline basis (see the [Appendix][a detailed example] and p.126-7 in @wood_generalized_2006).
```{r csbyhand, echo=FALSE}
rk = function(x, z) {
((z - 0.5) ^ 2 - 1 / 12) * ((x - 0.5) ^ 2 - 1 / 12) / 4 -
((abs(x - z) - 0.5) ^ 4 - (abs(x - z) - 0.5) ^ 2 / 2 + 7 / 240) / 24
}
spl.X = function(x, xk) {
q = length(xk) + 2
n = length(x)
X = matrix(1, n, q)
X[, 2] = x
X[, 3:q] = outer(x, xk, FUN = rk)
X
}
# xk = quantile(pisa$Income, na.rm=T)[2:4]
# xk = c(.66,.74,.83)
xk = seq(.658, .833, len = 8)
#xk = quantile(pisa$Income, na.rm=T)[1:4]
X = spl.X(pisa$Income, xk)
mod.1 = lm(Overall ~ X - 1, data = pisa)
xp = 40:100 / 100
Xp = spl.X(xp, xk)
mod_init = gam(Overall ~ s(Income, bs = 'cs', k = 8, xt = list(knots = xk)), data = pisa)
csbyhand = ggplot(aes(x = Income, y = Overall), data = pisa) +
geom_point(color = "#D55E00", alpha = .75) +
geom_line(
aes(x = xp, y = Xp %*% coef(mod.1)),
data = data.frame(xp, Xp),
color = "#56B4E9",
lwd = 1
) +
geom_line(
aes(x = Income, y = pred),
data = pisa %>% add_predictions(mod_init),
color = palettes$stan_red$stan_red,
lwd = 1
) +
xlim(.4, 1)
csbyhand
```
<br>
A cubic spline is essentially a connection of multiple cubic polynomial regressions, similar to what we demonstrated previously. We choose points of the feature variable at which to create sections, and these points are referred to as *knots*. Separate cubic polynomials are fit at each section, and then joined at the knots to create a continuous curve. The above graph represents a cubic spline with 8 knots between the first and third quartiles[^tenknots]. The red line uses the GAM results from
<span class="pack">mgcv</span>.
#### Fitting the model
Let's now fit an actual generalized additive model using the same cubic spline as our basis. We again use the <span class="func">gam</span> function as before for basic model fitting, but now we are using a function <span class="func">s</span> within the formula to denote the smooth terms. Within that function we also specify the type of smooth, though a default is available. I chose `bs = cr`, denoting cubic regression splines, to keep consistent with our previous example.
```{r mod_gam1}
mod_gam1 = gam(Overall ~ s(Income, bs = "cr"), data = pisa)
summary(mod_gam1)
```
The first thing to note is that, aside from the smooth part, our model code is similar to what we're used to with core R functions such as <span class="func">lm</span> and <span class="func">glm</span>. In the summary, we first see the distribution assumed, as well as the link function used, in this case normal and identity, respectively, which to iterate, had we had no smoothing, would result in a SLiM. After that we see that the output is separated into *parametric* and *smooth*, or nonparametric parts[^nonpara]. In this case, the only parametric component is the intercept, but it's good to remember that you are not bound to use a smooth for every feature of interest, and indeed, as we will discuss in more detail later, part of the process may involve refitting the model with terms that were found to be linear for the most part anyway. The smooth component of our model regarding a country's income and its relationship with overall science score suggests it is statistically significant, but there are a couple of things in the model summary that would be unfamiliar.
[^nonpara]: As an aside, the term *nonparametric* has at least two general uses in the statistical world. [Wikipedia](https://en.wikipedia.org/wiki/Nonparametric_statistics) has a nice delineation.
We'll start with the *effective degrees of freedom*[^lme4], or `edf`. In typical OLS regression the model degrees of freedom is equivalent to the number of features/terms in the model. This is not so straightforward with a GAM due to the smoothing process and the penalized regression estimation procedure, something that will be discussed more later[^edfinit]. In this situation, we are still trying to minimize the residual sums of squares, but we also have a built-in penalty for 'wiggliness' of the fit, where in general we try to strike a balance between an undersmoothed fit and an oversmoothed fit. The default p-value for the test is based on the effective degrees of freedom and the rank $r$ of the covariance matrix for the coefficients for a particular smooth, so here, *conceptually*, it is the p-value associated with the $F(r, n-edf)$. However, there are still other issues to be concerned about, and `?summary.gam` will provide your first step down that particular rabbit hole. For hypothesis testing an alternate edf is actually used, which is the other one provided there in the summary result[^refdf]. At this point you might be thinking these p-values are a bit fuzzy, and you'd be right. The gist is, they aren't to be used for harsh cutoffs, say, at an arbitrary .05 level[^p05], but if they are pretty low you can feel comfortable claiming statistical significance, which of course is the end all, be all, of the scientific endeavor- right?
[^lme4]: This part regarding effective degrees of freedom should ring a bell for those who use the <span class="pack">lme4</span> package for mixed models. All of this applies there too, and may provide more insight as to why they don't even provide p-values as a default. See the [Other Approaches][Categorical variables] section.
The GCV, or *generalized cross validation* score can be taken as an estimate of the *mean square prediction error* based on a leave-one-out cross validation estimation process. We estimate the model for all observations except $i$, then note the squared residual predicting observation $i$ from the model. Then we do this for all observations. However, the GCV score is an efficient measure of this concept that doesn't actually require fitting all those models and overcomes other issues[^gcvnote]. It is this score that is minimized by default when determining the specific nature of the smooth. On its own it doesn't tell us much, but we can use it similar to AIC as a comparative measure to choose among different models, with lower being better.
[^gcvnote]: In this initial model the GCV can be found as: <br> $GCV = \frac{n*scaled\, est.}{(n-edf-{[n\,of\, parametric\, terms]})^{2}}$
### Visualization
One can get sense of the form of the fit by plotting the model object as follows[^othervis]:
```{r plot_mod_gam1, echo=1, eval = -(1:2), fig.asp=.75}
plot(mod_gam1)
gratia::draw(mod_gam1, constant = coef(mod_gam1)[1]) # add the intercept to get back to the response scale
plot_gam(mod_gam1, line_color = okabe_ito[2], ribbon_color = 'gray92') +
ylab('s(Income, 6.9)') +
theme(
axis.line.y = element_blank(),
# axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
```
Note that the plot function will only plot the smooth terms in the model- the others are straight lines since they are linear effects. The intervals are Bayesian credible intervals based on the posterior predictive distribution. In this single feature case, one can also revisit the previous graph.
[^othervis]: More visually appealing alternatives to the default <span class="pack">mgcv</span> plot can be found with <span class="pack">ggeffects</span>, <span class="pack">mgcViz</span>, <span class="pack">gratia</span>, and probably others. In most places, I'm using my own functions from <span class="pack">visibly</span>, which you can get from [GitHub](https://m-clark.github.io/visibly/).
### Model Comparison
Lets compare our regular regression fit to the GAM fit. The following shows how one can extract various measures of performance, and the subsequent table shows them gathered together.
```{r mod_lm_fit}
AIC(mod_lm)
summary(mod_lm)$sp.criterion
summary(mod_lm)$r.sq # adjusted R squared
```
Do the same to extract those same elements from the GAM. The following display makes for easy comparison.
```{r compareLMGAM, echo=F}
modcomp = bind_rows(
tibble(
AIC = AIC(mod_lm),
GCV = summary(mod_lm)$sp.criterion,
`R^2` = summary(mod_lm)$r.sq
),
tibble(
AIC = AIC(mod_gam1),
GCV = summary(mod_gam1)$sp.criterion,
`R^2` = summary(mod_gam1)$r.sq
)
)
modcomp$model = c('LM', 'GAM')
modcomp %>%
relocate(model) %>%
gt()
```
<br>
Comparing these various measures, it's safe to conclude that the GAM fits better. We can also perform the familiar statistical test via the <span class="func">anova</span> function we apply to other R model objects. As with the previous p-value issue, we can't get too carried away, and technically one could have a model with even more terms but lower *edf*, but would be difficult to interpret[^anovagam]. As it would be best to be conservative, we'll proceed cautiously.
[^anovagam]:`?anova.gam` for more information.
```{r mod_lm_mod_gam1_anova}
anova(mod_lm, mod_gam1, test = "Chisq")
```
It would appear the ANOVA results tell us what we have probably come to believe already, that incorporating nonlinear effects has improved the model considerably.
## Multiple Features
Let's now see what we can do with a more realistic case where we have added model complexity.
### Linear Fit
We'll start with the linear model approach again, this time adding the Health and Education indices.
```{r mod_lm2}
mod_lm2 = gam(Overall ~ Income + Edu + Health, data = pisa)
summary(mod_lm2)
```
It appears we have statistical effects for Income and Education, but not for Health, and the adjusted R-squared suggests a notable amount of the variance is accounted for[^samplediff]. Let's see about nonlinear effects.
[^samplediff]: Note that a difference in sample sizes do not make this directly comparable to the first model.
### GAM
As far as the generalized additive model goes, we can approach things in a similar manner as before, and now and look for nonlinear effects for each feature[^defaultsmooth].
[^defaultsmooth]: The default smoother for `s()` is the argument `bs='tp'`, a thin plate regression spline.
```{r mod_gam2}
mod_gam2 = gam(Overall ~ s(Income) + s(Edu) + s(Health), data = pisa)
summary(mod_gam2)
```
There are again a couple things to take note of. First, statistically speaking, we come to the same conclusion as the linear model regarding the individual effects. One should take particular note of the effect of Health index. The effective degrees of freedom with value 1 suggests that it has essentially been reduced to a simple linear effect. The following will update the model to explicitly model the effect as linear, but as one can see based on GCV and other values, the results are identical, because the penalty had effectively rendered it linear.
```{r mod_gam2_update}
mod_gam2B = update(mod_gam2, . ~ . - s(Health) + Health)
summary(mod_gam2B)
```
We can also note that this model accounts for much of the variance in Overall science scores, with an adjusted R-squared of .86. In short, it looks like the living standards and educational resources of a country are associated with overall science scores, even if we don't really need the Health index in the model.
### Visualization
Now we examine the effects of interest visually via component plots[^componentplot]. The following uses my own function from <span class="pack">visibly</span> but you can also use packages like <span class="pack">ggeffects</span> or <span class="pack">gratia</span>. The following is close to what you'd get with the former.
```{r plot_demos, eval=FALSE}
plot(ggeffects::ggpredict(mod_gam2), facets = TRUE)
gratia::draw(mod_gam2)
```
```{r mod_gam2_plot, echo=F, fig.align='center', fig.asp=.75}
plot_gam(mod_gam2, ncol = 2, line_color = okabe_ito[2], ribbon_color = 'gray92') +
ylim(c(225, 575))
```
Here we can see the effects of interest, and again one might again note the penalized-to-linear effect of Health[^mgcvyaxis]. As before, we see the tapering off of Income's effect at its highest levels, and in addition, a kind of sweet spot for a positive effect of Education in the mid-range values, with a slight positive effect overall. Health, as noted, has been reduced to a linear, surprisingly negative effect, but again this is not statistically significant.
[^mgcvyaxis]: One will notice that for the default gam plot in <span class="pack">mgcv</span>, the y-axis scale is on that of the linear predictor, but due to identifiability constraints, the smooths must sum to zero, and so they are presented in a mean-centered fashion.
The following code demonstrates how to create data necessary for the plot, specifically for incomes, with the other features held at their mean values. See the [technical section][visual depiction] for more details on these visualizations.
```{r plot_mod_gam2_responseA}
# Note that mod_gam2$model is the data that was used in the modeling process,
# so it will have NAs removed.
testdata = data.frame(
Income = seq(.4, 1, length = 100),
Edu = mean(mod_gam2$model$Edu),
Health = mean(mod_gam2$model$Health)
)
predictions = predict(
mod_gam2,
newdata = testdata,
type = 'response',
se = TRUE
)
df_preds = data.frame(testdata, predictions) %>%
mutate(lower = fit - 1.96 * se.fit,
upper = fit + 1.96 * se.fit)
ggplot(aes(x = Income, y = fit), data = df_preds) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = 'gray92') +
geom_line(color = '#56B4E9')
```
#### 2d Smooths
The GAM gives us a sense for one feature, but let's now take a gander at Income and Education at the same time. Previously, we saw how to use the <span class="func">plot</span> method for a GAM class object. There is another plotting function, <span class="func">vis.gam</span>, that will give us a bit more to play with, and specifically to display 2d smooths. The actual plot shown provided instead depicts a heatmap with the values on the response scale.
```{r contour, eval=FALSE}
vis.gam(mod_gam2, type = 'response', plot.type = 'contour')
```
```{r contourheat, echo=FALSE, cache=FALSE, fig.asp=.66}
plot_gam_2d(
model = mod_gam2,
main_var = Income,
second_var = Edu,
n_plot = 250
) +
theme(legend.text = element_text(size = 6))
```
<br>
First and foremost, the figure reflects the individual plots, and we can see high on Income generally produces the highest scores, while Education has less of an effect. Conversely, being low on both the Education and Income indices are associated with poor Overall science scores. While interesting, these respective smooths were created separately of one another, and there is another way we might examine how these two effects work together in predicting the response.
So let's take a look at another approach, continuing the focus on visualization. It may not be obvious at all, but one can utilize smooths of more than one feature, in effect, a smooth of the smooths of the variables that go into it. This is akin to an interaction in typical model settings[^ti]. Let's create a new model to play around with this. After fitting the model, I provide both a visualization for comparison to the previous, as well as a 3D view one can interactively rotate to their liking.
```{r mod_gam3, eval=1:3}
mod_gam3 = gam(Overall ~ te(Income, Edu), data = pisa)
summary(mod_gam3)
vis.gam(
mod_gam3,
type = 'response',
plot.type = 'persp',
phi = 30,
theta = 30,
n.grid = 500,
border = NA
)
```
```{r mod_gam3_plot, fig.asp='75%', echo=F, cache=FALSE}
plot_gam_2d(
model = mod_gam3,
main_var = Income,
second_var = Edu,
palette = 'bilbao',
direction = 1
)
plot_gam_3d(
model = mod_gam3,
main_var = Income,
second_var = Edu,
palette = 'bilbao',
direction = 1
)
```
<br>
In the above we are using a type of smooth called a tensor product smooth, and by smoothing the marginal smooths of Income and Education, we see a bit clearer story. As we might suspect, wealthy countries with more of an apparent educational infrastructure are going to score higher on the Overall science score. However, wealth alone does not necessarily guarantee higher science scores (note the dark bottom right corner on the contour plot)[^qatar], though without at least moderate wealth hopes are fairly dim for a decent score.
One can also, for example, examine interactions between a smooth and a linear term $f(x)z$, and in a similar vein of thought look at smooths at different levels of a [grouping factor][Categorical variables].
### Model Comparison
As before, we can examine indices such as GCV or perhaps adjusted R-squared, which both suggest our GAM performs considerably better. Statistically we can compare the two models with <span class="func">anova</span> as before.
```{r mod_lm2_mod_gam2_anova}
anova(mod_lm2, mod_gam2, test = "Chisq")
```
Not that we couldn't have assumed as such already, but now we have additional statistical evidence to suggest that incorporating nonlinear relationships of the features improves the model.
[^devianceexplained]: For those more familiar with generalized linear models, this is calculated as ($\mathrm{Dev}_{Null} - \mathrm{Dev}_{Residual}$)/$\mathrm{Dev}_{Null}$ <br><br>
One can verify this by running the same model via the <span class="func">glm</span> function, and using the corresponding values from the summary of the model object.
[^tenknots]: So ten total knots including the endpoints.
[^polyreg]: See <span class="func">poly</span> for how to fit a polynomial in R.
[^edfinit]: In this example, there are actually 9 terms associated with this smooth, but they are each 'penalized' to some extent and so the edf does not equal 9.
[^refdf]: Here it is noted `Ref.df`. In the past, there were four or five p-values to choose from, but now the option has settled to Bayesian-esque vs. frequentist approaches. The full story of edf, p-values and related is scattered throughout Wood's text. See also `?anova.gam`.
[^p05]: But then, standard p-values shouldn't be used that way either.
[^componentplot]: See e.g. John Fox's texts and chapters regarding regression diagnostics, and his <span class="func">crplots</span> function in the <span class="pack">car</span> package.
[^ti]: See the smooth `ti` for an ANOVA decomposition of such smooths into main effects and interaction.
[^qatar]: Probably this is due to Qatar. Refer again to the previous [scatterplot][Initial Examination].