generated from r4ds/bookclub-template
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path15_model-performance-measures.Rmd
111 lines (76 loc) · 3.13 KB
/
15_model-performance-measures.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
output: html_document
editor_options:
chunk_output_type: console
---
# Model-performance Measures
**Learning objectives:**
- Model performance measures and evaluation
- Goodness-of-fit (GoF)
- Goodness-of-prediction (GoP)
## Introduction
- Model evaluation (how reliable are the model's predictions?)
- Model comparison (compare two or more models and decide which is the best model)
- Out-of-sample and out-of-time comparisons (how the model performs with new, unseen data)
## Method
![Source: Table 15.1](img/15-model-performance-measures/ch_15_table_1.jpg)
## Example: Apartment prices
![Source: Figure 15.1](img/15-model-performance-measures/figure-15.1.png)
## Example: Titanic data
![Source: Table 15.2](img/15-model-performance-measures/table_15.2.png)
![Source: Figure 15.2](img/15-model-performance-measures/figure-15.2.png)
![Source: Figure 15.3](img/15-model-performance-measures/figure-15.3.png)
![Source: Figure 15.4](img/15-model-performance-measures/figure-15.4.png)
## Pros and cons
Pros
- Most used continuous dependent variable metrics (RMSE, MAD, $R^2$) provide a fairly simple way to compare the suitability of predicted and actual values.
- For binary/categorical dependent variables, the use of ROC-AUC and lift charts provide a comprehensive metric to compare models performance.
Cons
- Some continuous dependent variable mtrics (i.e., RMSE) can be sensitive to outliers.
- Binary dependent variable metrics can vary on the selected cut-off values used for creating predictions.
## R code snippetts
Let's retrieve the `titanic_imputed` dataset, and the `titanic_lmr` and `titanic_rf` models.
```{r 15-load-objects}
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")
titanic_lmr <- archivist::aread("pbiecek/models/58b24")
titanic_rf <- archivist::aread("pbiecek/models/4e0fc")
```
Construct the explainers
```{r 15-construct-explainers}
library("rms")
library("randomForest")
library("DALEX")
# explain_lmr <- explain(model = titanic_lmr,
# data = titanic_imputed[, -9],
# y = titanic_imputed$survived == "yes",
# type = "classification",
# label = "Logistic Regression")
explain_lmr <- readRDS("./explainers/explain_lmr.rds")
# explain_rf <- explain(model = titanic_rf,
# data = titanic_imputed[, -9],
# y = titanic_imputed$survived == "yes",
# label = "Random Forest")
explain_rf <- readRDS("./explainers/explain_rf.rds")
```
Function `model_performance()` calculates, by default, a set of selected model-performance measures.
```{r 15-model-performance-rf}
(eva_rf <- DALEX::model_performance(explain_rf))
```
```{r 15-model-performance-lmr}
(eva_lr <- DALEX::model_performance(explain_lmr))
```
Plot the residual histograms and precision-recall curves for both models.
```{r 15-plots}
library("patchwork")
p1 <- plot(eva_rf, eva_lr, geom = "histogram")
p2 <- plot(eva_rf, eva_lr, geom = "prc")
p1 + p2
```
### Cohort 1 {-}
`r knitr::include_url("https://www.youtube.com/embed/URL")`
<details>
<summary> Meeting chat log </summary>
```
LOG
```
</details>