Add files via upload

Henryfermi · Jul 19, 2024 · b9407c2 · b9407c2
commit b9407c2
Show file tree

Hide file tree

Showing 2 changed files with 859 additions and 0 deletions.
diff --git a/Weightliftingclassification.Rmd b/Weightliftingclassification.Rmd
@@ -0,0 +1,111 @@
+---
+title: "WEIGHT LIFTING CLASSIFICATION"
+author: "Henry Muhumuza"
+date: "2024-07-19"
+output: html_document
+---
+<style>
+body {
+  font-family: 'Times New Roman', Times, serif;
+}
+h1 {
+  font-family: 'Courier New', Courier, monospace;
+}
+</style>
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+## Introduction
+
+Using devices such as Jawbone Up, Nike FuelBand, and Fitbit, it is now possible to collect a large amount of data about personal activity. These devices are part of the quantified self movement. In this project, our goal is to predict the manner in which participants performed weight lifting exercises using data from accelerometers.The goal is to predict the manner in which they did the exercise. This is the "classe" variable in the training set.Class A corresponds to the specified execution of the exercise, while the other 4 classes (B to E) correspond to common mistakes.
+
+## Data Loading and Exploration
+First, we load the training and test datasets and explore their structure.
+```{r}
+# Load necessary libraries
+library(caret)
+library(randomForest)
+library(dplyr)
+
+# Load the data
+training_data <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv")
+testing_data <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv")
+
+# Display structure of the training data
+str(training_data)
+
+```
+## Data Preprocessing
+Next, we preprocess the data by handling missing values and ensuring classe variable is a factor variable.
+```{r}
+# Remove columns with too many NAs or irrelevant columns
+training_data <- training_data[, colSums(is.na(training_data)) == 0]
+
+# Remove columns with empty cells
+empty_cells <- apply(training_data, 2, function(col) any(col == ""))
+training_data <- training_data[, !empty_cells]
+
+# Remove last row of testing data
+testing_data <- testing_data[,-160]
+
+# Remove columns with too many NAs or irrelevant columns
+testing_data <- testing_data[, colSums(is.na(testing_data)) == 0]
+
+# Remove columns with empty cells
+empty_cells <- apply(testing_data, 2, function(col) any(col == ""))
+testing_data <- testing_data[, !empty_cells]
+
+# Ensure the classe variable is a factor
+training_data$classe <- as.factor(training_data$classe)
+
+# Display cleaned data structure
+str(training_data)
+
+```
+## Model Building
+Sub split the training data into a training set and a validation set, then train a Random Forest model.
+
+```{r}
+# Split the training data
+set.seed(123)
+trainIndex <- createDataPartition(training_data$classe, p = .7, 
+                                  list = FALSE, 
+                                  times = 1)
+trainSet <- training_data[trainIndex,]
+validSet <- training_data[-trainIndex,]
+
+# Train a Random Forest model
+model_rf <- randomForest(classe ~ ., data = trainSet, importance = TRUE)
+
+# Display the model
+print(model_rf)
+
+```
+## Model Evaluation
+We evaluate the model using the validation set.
+```{r}
+# Make predictions on the validation set
+predictions <- predict(model_rf, validSet)
+
+# Display the confusion matrix
+confusionMatrix(predictions, validSet$classe)
+
+```
+
+## Predictions on Test Data
+
+We use the trained model to predict the classe variable for the test data.
+
+```{r}
+# Predict on test data
+test_predictions <- predict(model_rf, testing_data)
+
+# Display test predictions
+test_predictions
+```
+
+## Conclusion
+The model performed reasonably well, predicting most of the classes accurately. Random forests method is one of the best prediction algorithms. 
+
diff --git a/Weightliftingclassification.html b/Weightliftingclassification.html