s1_student.Rmd

---
title: "Introduction to Dimensional Reduction in R - UseR!22 (Section I)"
author: "Isabella Bicalho Frazeto"
date: '2022-06-19'
output: html_document
---

# Section 1
In this section we will explore the dataset for PCA and ICA and define a little bit more how to to use tidymodels. 

## Importing libraries

```{r}
#import data and tidy models, section 1
library(tidymodels)
tidymodels_prefer()
library(bestNormalize)
library(cowplot)
library(ggforce)
library(forcats)
library(tidytext)
```

## Dataset

Let's inspect the dataset

```{r}

library(beans)
skimr::skim(beans)
colnames(beans)
```

Taking a closer look at area:

```{r}


```

Would log-scaling make a difference?

```{r}

```


## Tidy models

```{r}
set.seed(126)
beans_split <- initial_split(beans, prop = 0.80)
beans_train <- training(beans_split)
beans_test  <-  testing(beans_split)

```


### Recipes
A step-by-step description of how you process your data prior to the analysis.


```{r}
recipe_rf <- recipe(class ~ ., data = beans_train) 

```

### What is our model? 

```{r}
rf_model <- rand_forest(
            mode = "classification",
            engine = "ranger"
            )
```

### Workflow

```{r}

rf_wk <- workflow() %>%
          add_recipe(recipe_rf) %>%
          add_model(rf_model)

```

### Fitting the workflow 

```{r}
rf_fit <- fit(rf_wk, beans_train)

```

### Where do I get my results?

```{r}
rf_predict <- predict(rf_fit, beans_test)

```


```{r}
rf_aug <- augment(rf_fit, beans_test)

```

## Metrics

```{r}
rf_aug %>% 
  recall(truth = class, estimate = .pred_class,  estimator = "macro_weighted")
```


### Our preprocessing

We have seen in the previous section that we have to some preprocessing. Here, we will:
1) remove zero_variance from the data
2) do order quantile normalizing 
3) center and scale the data (normalize)

```{r}
  
beans_preproc_rec <-

```

We have a recipe, we can prepare our data.

```{r}

beans_rec_prepped <-

beans_rec_prepped
```

We can bake our recipe to get our results

```{r}
beans_processed <- 
```

Recall the previous plot: 

```{r}

beans %>%
  ggplot(aes(x = area)) +
  geom_histogram() +
  theme_minimal() +
  panel_border(color = "black")

```

Let's now take a look at the area now

```{r}
beans_processed %>%
  ggplot(aes(x = area)) +
  geom_histogram() +
  theme_minimal() +
  panel_border(color = "black")
```