-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit b9407c2
Showing
2 changed files
with
859 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
--- | ||
title: "WEIGHT LIFTING CLASSIFICATION" | ||
author: "Henry Muhumuza" | ||
date: "2024-07-19" | ||
output: html_document | ||
--- | ||
<style> | ||
body { | ||
font-family: 'Times New Roman', Times, serif; | ||
} | ||
h1 { | ||
font-family: 'Courier New', Courier, monospace; | ||
} | ||
</style> | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
## Introduction | ||
|
||
Using devices such as Jawbone Up, Nike FuelBand, and Fitbit, it is now possible to collect a large amount of data about personal activity. These devices are part of the quantified self movement. In this project, our goal is to predict the manner in which participants performed weight lifting exercises using data from accelerometers.The goal is to predict the manner in which they did the exercise. This is the "classe" variable in the training set.Class A corresponds to the specified execution of the exercise, while the other 4 classes (B to E) correspond to common mistakes. | ||
|
||
## Data Loading and Exploration | ||
First, we load the training and test datasets and explore their structure. | ||
```{r} | ||
# Load necessary libraries | ||
library(caret) | ||
library(randomForest) | ||
library(dplyr) | ||
# Load the data | ||
training_data <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv") | ||
testing_data <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv") | ||
# Display structure of the training data | ||
str(training_data) | ||
``` | ||
## Data Preprocessing | ||
Next, we preprocess the data by handling missing values and ensuring classe variable is a factor variable. | ||
```{r} | ||
# Remove columns with too many NAs or irrelevant columns | ||
training_data <- training_data[, colSums(is.na(training_data)) == 0] | ||
# Remove columns with empty cells | ||
empty_cells <- apply(training_data, 2, function(col) any(col == "")) | ||
training_data <- training_data[, !empty_cells] | ||
# Remove last row of testing data | ||
testing_data <- testing_data[,-160] | ||
# Remove columns with too many NAs or irrelevant columns | ||
testing_data <- testing_data[, colSums(is.na(testing_data)) == 0] | ||
# Remove columns with empty cells | ||
empty_cells <- apply(testing_data, 2, function(col) any(col == "")) | ||
testing_data <- testing_data[, !empty_cells] | ||
# Ensure the classe variable is a factor | ||
training_data$classe <- as.factor(training_data$classe) | ||
# Display cleaned data structure | ||
str(training_data) | ||
``` | ||
## Model Building | ||
Sub split the training data into a training set and a validation set, then train a Random Forest model. | ||
|
||
```{r} | ||
# Split the training data | ||
set.seed(123) | ||
trainIndex <- createDataPartition(training_data$classe, p = .7, | ||
list = FALSE, | ||
times = 1) | ||
trainSet <- training_data[trainIndex,] | ||
validSet <- training_data[-trainIndex,] | ||
# Train a Random Forest model | ||
model_rf <- randomForest(classe ~ ., data = trainSet, importance = TRUE) | ||
# Display the model | ||
print(model_rf) | ||
``` | ||
## Model Evaluation | ||
We evaluate the model using the validation set. | ||
```{r} | ||
# Make predictions on the validation set | ||
predictions <- predict(model_rf, validSet) | ||
# Display the confusion matrix | ||
confusionMatrix(predictions, validSet$classe) | ||
``` | ||
|
||
## Predictions on Test Data | ||
|
||
We use the trained model to predict the classe variable for the test data. | ||
|
||
```{r} | ||
# Predict on test data | ||
test_predictions <- predict(model_rf, testing_data) | ||
# Display test predictions | ||
test_predictions | ||
``` | ||
|
||
## Conclusion | ||
The model performed reasonably well, predicting most of the classes accurately. Random forests method is one of the best prediction algorithms. | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.