GitHub - zachlovescoffee/GCD-Project: Getting and Cleaning Data Course Project Repository

Getting and Cleaning Data Course Project README

Prepared by Zach (wfio@github)

Overview

The purpose of this script is to provide a tidy data output for later review and analysis. The script uses a functional programming as well as a split-comb ine and apply approach for ingesting untidy data, manipulating individual elem ments within the dataset and finally preparing the dataset via merging for lat- er review and analysis.

The files you need to download to run this script are located at: External Link

Script Generation Environment & Techniques

The script was prepared in a Windows 7 32-bit environment and crafted using RStudio v0.98.1103 running on R-base version 3.20.0. The only non-generic pack- age used to complete this script is the Hadley Wickham 'dplyr' package. The pa- ckage version used for this assignment was 0.4.1. The Hmisc::decribe function was used to create the CODEBOOK.md file to describe technical features of the variables used.

The parent function (run_analysis) calls other functions within the script to process each phase of the data manipuation exercise. Initially, the fun- ction works on two different datasets (test & training) preparing them for gr- ouping, merging and final summary stat generation in the final_dataset() def.

The script was fully commented using Google's R Style Guide, which is why I wi- ll spare you the rambling of walking you through my code in the README. The sc- ipt is fully commented and it should be transparent. Here's Google's stuff:

[Google R Style Guide](https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml)

The function will print the final results to console while generatating a .txt file called "run_analysis_tidy_data.txt". To view the output text file please consider the following command in your R console:

*your_var <- read.table(file_path, header = TRUE)*
  *View(your_var)*

If you're using RStudio, otherwise print() to your console.

Preparing the dataset in your environment for running the script

1. Download the dataset *.zip file to your PC and extract to a directory on your harddrive.
- 1.1. There will be a parent directory /UCI HAR Dataset followed by:
- 1.1.2 /test directory, three text files and an Inertia sub-dir.
- 1.1.3 /training directory, three text files and an Inertia sub-dir.
- 1.1.4 /four text files that were used to prepare variable, measurement na- mes/details and a README.
1. Save the run_analysis.R script into the /UCI HAR Dataset directory.
1. Set your working directory as: setwd("file_path_of_UCI_HART_Dataset").
- 3.1 Verify your working directory: getwd()
- 3.2 Verify the files from above were loaded: list.files()
1. The script requires the 'dplyr' package. If you do not have it installed, please perform: install.packages("dplyr") from your console. The dependenc- ies and other requirements of the package should be reviewed before proceeding by reviewing the package description at:
dplyr
1. Once all of the above are complete, please open the script utilizing the source function: source("file_path").
1. This will load the R script executable code into the console.
1. Type run_analysis() and R will execute the code, print the results to the console and then write a text file into the working directory.

If you run into any trouble, please, e-mail [email protected] before reducing the score of any particular set. I'd be happy to walk you through the steps.

Generation of the Codebook

The codebook was generated by a combination of leveraging pre-existing info about the dataset from the 'features_info.txt' file in the collection of files and describing the transformations that occurred through the exercise. Please ref- erence that file for a full background on the features.

The variables all have a single-character prefix, 't' or 'f', 'time domain' or 'frequency domain signals', respectively and originated from two sensors:

Accelerometer
Gyroscope

The measures created were:

-tBodyAcc-XYZ
-tGravityAcc-XYZ
-tBodyAccJerk-XYZ
-tBodyGyro-XYZ
-tBodyGyroJerk-XYZ
-tBodyAccMag
-tGravityAccMag
-tBodyAccJerkMag
-tBodyGyroMag
-tBodyGyroJerkMag
-fBodyAcc-XYZ
-fBodyAccJerk-XYZ
-fBodyGyro-XYZ
-fBodyAccMag
-fBodyAccJerkMag
-fBodyGyroMag
-fBodyGyroJerkMag

Each measure was collected for the following tasks:

-WALKING
-WALKING_UPSTAIRS
-WALKING_DOWNSTAIRS
-SITTING
-STANDING
-LAYING

And the following summary statistics were generated:

-mean(): Mean value
-std(): Standard deviation
-mad(): Median absolute deviation 
-max(): Largest value in array
-min(): Smallest value in array
-sma(): Signal magnitude area
-energy(): Energy measure. Sum of the squares divided by the number of values. 
-iqr(): Interquartile range 
-entropy(): Signal entropy
-arCoeff(): Autorregresion coefficients with Burg order equal to 4
-correlation(): correlation coefficient between two signals
-maxInds(): index of the frequency component with largest magnitude
-meanFreq(): Weighted average of the frequency components to obtain a mean frequency
-skewness(): skewness of the frequency domain signal 
-kurtosis(): kurtosis of the frequency domain signal 
-bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window.
-angle(): Angle between to vectors.

The summary statistcs are appended to each variable name. For example, 
*tBodyGyroJerk-std()-Y* would be translated as:

- time frequency domain
- body jerk
- gyroscope sensor (X, Y, Z axial measurements available)
- standard deviation of Y-axial signal from the gyroscope sensor

### Preparing the Code Book ###

To prepare the overview of the code book I used the 'Hmisc' package function
'describe' to create a list of information about each of the variables
in the CODEBOOK.

Units of measurement:
- t <- time in seconds
- f <- frequency on Hertz (Hz)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CODEBOOK.md		CODEBOOK.md
LICENSE		LICENSE
README.md		README.md
run_analysis.R		run_analysis.R
run_anaysis_tidy_data.txt		run_anaysis_tidy_data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting and Cleaning Data Course Project README

Prepared by Zach (wfio@github)

Overview

Script Generation Environment & Techniques

Preparing the dataset in your environment for running the script

Generation of the Codebook

About

Releases

Packages

Languages

License

zachlovescoffee/GCD-Project

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data Course Project README

Prepared by Zach (wfio@github)

Overview

Script Generation Environment & Techniques

Preparing the dataset in your environment for running the script

Generation of the Codebook

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages