MLfromScratch is a library designed to help you learn and understand machine learning algorithms by building them from scratch using only NumPy
! No black-box libraries, no hidden magic—just pure Python and math. It's perfect for beginners who want to see what's happening behind the scenes of popular machine learning models.
Our package structure is designed to look like scikit-learn
, so if you're familiar with that, you'll feel right at home!
-
LinearRegression : Imagine drawing a straight line through a set of points to predict future values. Linear Regression helps in predicting something like house prices based on size.
-
SGDRegressor : A fast way to do Linear Regression using Stochastic Gradient Descent. Perfect for large datasets.
-
SGDClassifier : A classification algorithm predicting categories like "spam" or "not spam."
-
DecisionTreeClassifier : Think of this as playing 20 questions to guess something. A decision tree asks yes/no questions to classify data.
-
DecisionTreeRegressor : Predicts a continuous number (like temperature tomorrow) based on input features.
-
KNeighborsClassifier : Classifies data by looking at the 'k' nearest neighbors to the new point.
-
KNeighborsRegressor : Instead of classifying, it predicts a number based on nearby data points.
-
GaussianNB : Works great for data that follows a normal distribution (bell-shaped curve).
-
MultinomialNB : Ideal for text classification tasks like spam detection.
-
AgglomerativeClustering : Clusters by merging similar points until a single large cluster is formed.
-
DBSCAN : Groups points close to each other and filters out noise. No need to specify the number of clusters!
-
MeanShift : Shifts data points toward areas of high density to find clusters.
-
RandomForestClassifier : Combines multiple decision trees to make stronger decisions.
-
RandomForestRegressor : Predicts continuous values using an ensemble of decision trees.
-
GradientBoostingClassifier : Builds trees sequentially, each correcting errors made by the last.
-
VotingClassifier : Combines the results of multiple models to make a final prediction.
Measure your model’s performance:
-
accuracy_score : Measures how many predictions your model got right.
-
f1_score : Balances precision and recall into a single score.
-
roc_curve : Shows the trade-off between true positives and false positives.
-
train_test_split : Splits your data into training and test sets.
-
KFold : Trains the model in 'k' iterations for better validation.
-
StandardScaler : Standardizes your data so it has a mean of 0 and a standard deviation of 1.
-
LabelEncoder : Converts text labels into numerical labels (e.g., "cat", "dog").
Dimensionality Reduction helps in simplifying data while retaining most of its valuable information. By reducing the number of features (dimensions) in a dataset, it makes data easier to visualize and speeds up machine learning algorithms.
- PCA (Principal Component Analysis) : PCA reduces the number of dimensions by finding new uncorrelated variables called principal components. It projects your data onto a lower-dimensional space while retaining as much variance as possible.
- How It Works: PCA finds the axes (principal components) that maximize the variance in your data. The first principal component captures the most variance, and each subsequent component captures progressively less.
- Use Case: Use PCA when you have many features, and you want to simplify your dataset for better visualization or faster computation. It is particularly useful when features are highly correlated.
- Learning-First Approach: If you're a beginner and want to understand machine learning, this is the library for you. No hidden complexity, just code.
- No Hidden Magic: Everything is written from scratch, so you can see exactly how each algorithm works.
- Lightweight: Uses only
NumPy
, making it fast and easy to run.
# Clone the repository
git clone https://github.com/adityajn105/MLfromScratch.git
# Navigate to the project directory
cd MLfromScratch
# Install the required dependencies
pip install -r requirements.txt
This project is maintained by Aditya Jain
Constributor: Subrahmanya Gaonkar
We welcome contributions from everyone, especially beginners! If you're new to open-source, don’t worry—feel free to ask questions, open issues, or submit a pull request.
- Fork the repository.
- Create a new branch (git checkout -b feature-branch).
- Make your changes and commit (git commit -m "Added new feature").
- Push the changes (git push origin feature-branch).
- Submit a pull request and explain your changes.
This project is licensed under the MIT License - see the LICENSE file for details.