This code book describes the data used in this project, as well as the processing steps required to create the resulting tidy data set.
30 volunteers performed 6 different activities while wearing a smartphone. The smartphone captured various data about their movements.
-
features.txt
: Names of the 561 features. -
activity_labels.txt
: Names and IDs for each of the 6 activities. -
X_train.txt
: 7352 observations of the 561 features, for 21 of the 30 volunteers. -
subject_train.txt
: A vector of 7352 integers, denoting the ID of the volunteer related to each of the observations inX_train.txt
. -
y_train.txt
: A vector of 7352 integers, denoting the ID of the activity related to each of the observations inX_train.txt
. -
X_test.txt
: 2947 observations of the 561 features, for 9 of the 30 volunteers. -
subject_test.txt
: A vector of 2947 integers, denoting the ID of the volunteer related to each of the observations inX_test.txt
. -
y_test.txt
: A vector of 2947 integers, denoting the ID of the activity related to each of the observations inX_test.txt
.
More information about the files is available in README.md
. More information about the features is available in features.txt
.
This analysis was performed using only the files above, and did not use the raw signal data. Therefore, the data files in the "Inertial Signals" folders were ignored.
- All of the relevant data files were read into data frames, appropriate column headers were added, and the training and test sets were combined into a single data set.
- All feature columns were removed that did not contain the exact string "mean()" or "std()". This left 66 feature columns, plus the subjectID and activity columns.
- The activity column was converted from a integer to a factor, using labels describing the activities.
- A tidy data set was created containing the mean of each feature for each subject and each activity. Thus, subject #1 has 6 rows in the tidy data set (one row for each activity), and each row contains the mean value for each of the 66 features for that subject/activity combination. Since there are 30 subjects, there are a total of 180 rows.
- The tidy data set was output to both a CSV and a TXT files (i.e tidy.csv and tidy.txt).