This project demonstrates how to perform multiple linear regression using Python and Scikit-Learn on the 50 Startups dataset. The dataset contains data on 50 startups with information on R&D Spend, Administration, Marketing Spend, and State, along with the corresponding profit.
In this project, we:
- Preprocess the data: Use OneHotEncoding to handle categorical variables (State).
- Avoid the dummy variable trap: Remove one of the one-hot encoded columns.
- Split the data: Divide the dataset into training and test sets.
- Fit the multiple linear regression model: Train the model on the training set.
- Evaluate the model: Measure the accuracy of the model using training and test scores.
numpy
: For array operationspandas
: For data handlingmatplotlib
: For plotting (not used in this case, but imported)scikit-learn
: For machine learning tasks such as encoding, splitting, and regressionpip install numpy pandas matplotlib scikit-learn
Cloning and Running the Project Clone this repository to your local machine:
```bash
git clone https://github.com/EbadShabbir/50-startups-regression.git
Navigate to the project directory:
```bash
cd 50-startups-regression
Ensure that you have the 50_Startups.csv dataset in the same directory as the script, or adjust the dataset path accordingly in the code.
Run the Python script:
python regression_model.py