-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy paths1_student.Rmd
165 lines (106 loc) · 2.31 KB
/
s1_student.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: "Introduction to Dimensional Reduction in R - UseR!22 (Section I)"
author: "Isabella Bicalho Frazeto"
date: '2022-06-19'
output: html_document
---
# Section 1
In this section we will explore the dataset for PCA and ICA and define a little bit more how to to use tidymodels.
## Importing libraries
```{r}
#import data and tidy models, section 1
library(tidymodels)
tidymodels_prefer()
library(bestNormalize)
library(cowplot)
library(ggforce)
library(forcats)
library(tidytext)
```
## Dataset
Let's inspect the dataset
```{r}
library(beans)
skimr::skim(beans)
colnames(beans)
```
Taking a closer look at area:
```{r}
```
Would log-scaling make a difference?
```{r}
```
## Tidy models
```{r}
set.seed(126)
beans_split <- initial_split(beans, prop = 0.80)
beans_train <- training(beans_split)
beans_test <- testing(beans_split)
```
### Recipes
A step-by-step description of how you process your data prior to the analysis.
```{r}
recipe_rf <- recipe(class ~ ., data = beans_train)
```
### What is our model?
```{r}
rf_model <- rand_forest(
mode = "classification",
engine = "ranger"
)
```
### Workflow
```{r}
rf_wk <- workflow() %>%
add_recipe(recipe_rf) %>%
add_model(rf_model)
```
### Fitting the workflow
```{r}
rf_fit <- fit(rf_wk, beans_train)
```
### Where do I get my results?
```{r}
rf_predict <- predict(rf_fit, beans_test)
```
```{r}
rf_aug <- augment(rf_fit, beans_test)
```
## Metrics
```{r}
rf_aug %>%
recall(truth = class, estimate = .pred_class, estimator = "macro_weighted")
```
### Our preprocessing
We have seen in the previous section that we have to some preprocessing. Here, we will:
1) remove zero_variance from the data
2) do order quantile normalizing
3) center and scale the data (normalize)
```{r}
beans_preproc_rec <-
```
We have a recipe, we can prepare our data.
```{r}
beans_rec_prepped <-
beans_rec_prepped
```
We can bake our recipe to get our results
```{r}
beans_processed <-
```
Recall the previous plot:
```{r}
beans %>%
ggplot(aes(x = area)) +
geom_histogram() +
theme_minimal() +
panel_border(color = "black")
```
Let's now take a look at the area now
```{r}
beans_processed %>%
ggplot(aes(x = area)) +
geom_histogram() +
theme_minimal() +
panel_border(color = "black")
```