Cassandra_C_Tidyverse_Extend.Rmd

---
title: "Tidyverse Extend"
author: "Cassandra Coste"
date: "4/28/2021"
output: html_document
---

### Tage N Singh TIDYVERSE Create


This vignette will provide a brief exploration of the "tidyverse" package using its built-in libraries

The tidyverse is a powerful collection of R packages that are actually data tools  All packages of the tidyverse share an underlying  common APIs as listed below. 

Lists

- ggplot2, which implements the grammar of graphics. You can use it to visualize your data.
- dplyr is a grammar of data manipulation. You can use it to solve the most common data manipulation challenges.
- tidyr helps you to create tidy data or data where each variable is in a column, each observation is a row end each value is a cell.
- readr is a fast and friendly way to read rectangular data.
- purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
- tibble is a modern re-imaginging of the data frame.
- stringr provides a cohesive set of functions designed to make working with strings as easy as posssible
-  forcats provide a suite of useful tools that solve common problems with factors.

Now for the package
-------------------

```{r setup, message=FALSE, warning=FALSE}
#Loading just the "tidyverse" library !
library(tidyverse)
```

```{r tidyverse usage-1}
# For this exercise we are using a dataset from kaggle.com which contain 
#information about avacado sales in various cities in the USA
# ---------- Using the "rear" function "read_csv" -------
avacado <- data.frame(read_csv(file = "https://raw.githubusercontent.com/tagensingh/SPS-DATA607-TIDYVERSE/main/avocado.csv"))

# ---------- Using the "tibble" function to look at a snapshot of the dataframe -------
tibble(avacado)
```

```{r tidyverse usage-2}
# ---------- Using the "arrange" function  from "dplyr"to sort the dataframe by date-------
avacado_date <- arrange(avacado,Date)
tibble(avacado_date)
# ---------- Using the "ggplot" function  from "ggplot2"to chart the pricing density of avacados-------
# Histogram overlaid with kernel density curve
ggplot(avacado, aes(x=AveragePrice)) + 
    geom_histogram(aes(y=..density..),      # Histogram with density instead of count on y-axis
                   binwidth=.1,
                   colour="black", fill="white") +
    geom_density(alpha=.1, fill="#FF6666")+# Overlay with transparent density plot
    ggtitle("Avacados Pricing Density")
```

As is shown above, the tidyverse packages is one of the most versatile packages in the R sphere....

### Cassandra Coste TIDYVERSE Extend

I am going to extend on the work of Tage by exploring the group by and summarize function within dplyr with the data and visualize the results. 

```{r}

avocado_region <- avacado %>% group_by(region)

```

You might notice that the group_by function won't make the data set look any different, but when we run calculations on the data, the group_by function will change the way the data is processed.  

Now, when we run the summarise function to calculate the mean it will do the calculation by region. This is a helpful function since you often want to compare groups to infer meaning from datasets.

```{r}

avocado_region <- avocado_region %>% summarise(regional_avg = mean(AveragePrice))
```

Finally, using ggplot we can compare regionally, where avocado prices are on average. 

```{r, fig.width=8,fig.height=12, message=FALSE, warning=FALSE}

ggplot(avocado_region, aes(x = reorder(region,-regional_avg), y = regional_avg, fill = "red")) + geom_bar(stat="identity") +
coord_flip() + ggtitle("Regional Average of Avocado Prices") + ylab("Average Price (USD)") + xlab("Region") + scale_y_continuous(labels = scales::dollar_format(), limits=c(0,2)) + theme(legend.position = "none") 
```


----------------------------------------------------------------------------------------------------
© 2021 GitHub, Inc.