The purpose of this script is to provide a tidy data output for later review and analysis. The script uses a functional programming as well as a split-comb ine and apply approach for ingesting untidy data, manipulating individual elem ments within the dataset and finally preparing the dataset via merging for lat- er review and analysis.
The files you need to download to run this script are located at: External Link
The script was prepared in a Windows 7 32-bit environment and crafted using RStudio v0.98.1103 running on R-base version 3.20.0. The only non-generic pack- age used to complete this script is the Hadley Wickham 'dplyr' package. The pa- ckage version used for this assignment was 0.4.1. The Hmisc::decribe function was used to create the CODEBOOK.md file to describe technical features of the variables used.
The parent function (run_analysis) calls other functions within the script to process each phase of the data manipuation exercise. Initially, the fun- ction works on two different datasets (test & training) preparing them for gr- ouping, merging and final summary stat generation in the final_dataset() def.
The script was fully commented using Google's R Style Guide, which is why I wi- ll spare you the rambling of walking you through my code in the README. The sc- ipt is fully commented and it should be transparent. Here's Google's stuff:
[Google R Style Guide](https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml)
The function will print the final results to console while generatating a .txt file called "run_analysis_tidy_data.txt". To view the output text file please consider the following command in your R console:
*your_var <- read.table(file_path, header = TRUE)*
*View(your_var)*
If you're using RStudio, otherwise print() to your console.
-
- Download the dataset *.zip file to your PC and extract to a directory on your harddrive.
- 1.1. There will be a parent directory /UCI HAR Dataset followed by:
- 1.1.2 /test directory, three text files and an Inertia sub-dir.
- 1.1.3 /training directory, three text files and an Inertia sub-dir.
- 1.1.4 /four text files that were used to prepare variable, measurement na- mes/details and a README.
-
- Save the run_analysis.R script into the /UCI HAR Dataset directory.
-
- Set your working directory as: setwd("file_path_of_UCI_HART_Dataset").
- 3.1 Verify your working directory: getwd()
- 3.2 Verify the files from above were loaded: list.files()
-
- The script requires the 'dplyr' package. If you do not have it installed, please perform: install.packages("dplyr") from your console. The dependenc- ies and other requirements of the package should be reviewed before proceeding by reviewing the package description at:
-
- Once all of the above are complete, please open the script utilizing the source function: source("file_path").
-
- This will load the R script executable code into the console.
-
- Type run_analysis() and R will execute the code, print the results to the console and then write a text file into the working directory.
If you run into any trouble, please, e-mail [email protected] before reducing the score of any particular set. I'd be happy to walk you through the steps.
The codebook was generated by a combination of leveraging pre-existing info about the dataset from the 'features_info.txt' file in the collection of files and describing the transformations that occurred through the exercise. Please ref- erence that file for a full background on the features.
The variables all have a single-character prefix, 't' or 'f', 'time domain' or 'frequency domain signals', respectively and originated from two sensors:
- Accelerometer
- Gyroscope
The measures created were:
-tBodyAcc-XYZ
-tGravityAcc-XYZ
-tBodyAccJerk-XYZ
-tBodyGyro-XYZ
-tBodyGyroJerk-XYZ
-tBodyAccMag
-tGravityAccMag
-tBodyAccJerkMag
-tBodyGyroMag
-tBodyGyroJerkMag
-fBodyAcc-XYZ
-fBodyAccJerk-XYZ
-fBodyGyro-XYZ
-fBodyAccMag
-fBodyAccJerkMag
-fBodyGyroMag
-fBodyGyroJerkMag
Each measure was collected for the following tasks:
-WALKING
-WALKING_UPSTAIRS
-WALKING_DOWNSTAIRS
-SITTING
-STANDING
-LAYING
And the following summary statistics were generated:
-mean(): Mean value
-std(): Standard deviation
-mad(): Median absolute deviation
-max(): Largest value in array
-min(): Smallest value in array
-sma(): Signal magnitude area
-energy(): Energy measure. Sum of the squares divided by the number of values.
-iqr(): Interquartile range
-entropy(): Signal entropy
-arCoeff(): Autorregresion coefficients with Burg order equal to 4
-correlation(): correlation coefficient between two signals
-maxInds(): index of the frequency component with largest magnitude
-meanFreq(): Weighted average of the frequency components to obtain a mean frequency
-skewness(): skewness of the frequency domain signal
-kurtosis(): kurtosis of the frequency domain signal
-bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window.
-angle(): Angle between to vectors.
The summary statistcs are appended to each variable name. For example,
*tBodyGyroJerk-std()-Y* would be translated as:
- time frequency domain
- body jerk
- gyroscope sensor (X, Y, Z axial measurements available)
- standard deviation of Y-axial signal from the gyroscope sensor
### Preparing the Code Book ###
To prepare the overview of the code book I used the 'Hmisc' package function
'describe' to create a list of information about each of the variables
in the CODEBOOK.
Units of measurement:
- t <- time in seconds
- f <- frequency on Hertz (Hz)