Defination: Machine learning is the scientific study of algorithms
and statistical models
that computer systems
use in order to perform a specific task
effectively without using explicit instructions
, relying on patterns and inference instead. It is seen as a subset of artificial intelligence.
When applying machine learning to real-world data, there are a lot of steps involved in the process -- starting with collecting the data and ending with generating predictions.
- Step 1: Gather the data In industry, there are important considerations you need to take into account when building a dataset, such as target.
- Step 2: Prepare the data Deal with missing values and categorical data. (Feature engineering,Feature Selection,Feature Transformation).
- Step 3: Select a model There are a lot of different types of models. Which one should you select based on Your business problem?
- Step 4: Train the model Fit Regression and Classifiaction models to patterns in training data.
- Step 5: Evaluate the model Use a validation set to assess how well a trained model performs on unseen data.
- Step 6: Tune parameters Tune parameters to get better performance from XGBoost models.
- Step 7: Get predictions Generate predictions with a trained model
scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.
The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us <https://scikit-learn.org/dev/about.html#authors>
__ page
for a list of core contributors.
It is currently maintained by a team of volunteers.
Website: https://scikit-learn.org
-
Python (>= 3.6)
-
NumPy (>= 1.13.3)
-
SciPy (>= 0.19.1)
-
joblib (>= 0.11)
-
Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4.
-
scikit-learn 0.23 and later require Python 3.6 or newer.
Scikit-learn plotting capabilities (i.e., functions start with plot_
and classes end with "Display") require Matplotlib (>= 2.1.1). For running the examples Matplotlib >= 2.1.1 is required. A few examples require scikit-image >= 0.13, a few examples require pandas >= 0.18.0, some examples require seaborn >= 0.9.0.
If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip
::
pip install -U scikit-learn
or conda
::
conda install scikit-learn
The documentation includes more detailed installation instructions <https://scikit-learn.org/stable/install.html>
_.