Subject headings of the course. You can synch with subjects and see the references/sources/materials used in this course.
First lecture of the course. Mostly contains Statistics, Data Science, and Data Mining basics.
1. The Place of Machine Learning, Data Science, and Artificial Intelligence
- Different Parts in Data Science
- The Fields of Data Science
- Source: https://dimensionless.in/understanding-different-components-roles-in-data-science/
2. Data Science Fundamental Steps
- Data steps and a few details
- Source: https://dimensionless.in/understanding-different-components-roles-in-data-science/
3. The Terms: "Learning" and "Machine Learning"
- What is "Learning" and Why we call it "Learning"
- What is Machine Learning
- Source: https://www.cmpe.boun.edu.tr/~ethem/i2ml2e/2e_v1-0/i2ml2e-chap1-v1-0.pdf
4. Supervised vs Unsupervised Learning
- What is "Supervised Learning"
- What is "Unsupervised Learning"
- Main differences between Supervised Learning and Unsupervised Learning
5. Data Types
- Data vs Meta-Data
- Categorical (Qualitative) Data
- Numerical (Quantitative) Data
- Source: https://databasetown.com/statistics-for-data-science-descriptive-inferential-statistics/
6. Data Representation in Computer World
- Main Data Formats: Csv, arff, office formats
- Other Data Formats: Database table, JSON, XML, image, sound, video, ...
- Unstructured vs. Structured Data
- Source: https://www.geeksforgeeks.org/difference-between-structured-semi-structured-and-unstructured-data/
7. Feature Terms & Data Table Exam
- Obese Dataset
- Definition of input & output variables
8. Descriptive Statistics
- Population vs Sample
- Descriptive Statistics vs Inferential Statistics
- Definition of input & output variables
- Frequency Distribution Table
- Measures of Central Tendency: Mean, mode, median
- Measures of Variability: VAR & STD Formulas
- Covariance & Correlation Coefficient
- Source #1: https://databasetown.com/statistics-for-data-science-descriptive-inferential-statistics/
- Source #2: https://github.com/ml874/Data-Science-Cheatsheet/blob/master/data-science-cheatsheet.pdf
9. Probability Basics
- Bayesian Statistics
- Probability Distribution Function
- Normal (Gauss) Distribution
Mostly contains Data Science and Classification basics.
1. Data Resampling
- Train, test, validation sets
- How to split data?
2. Cross Validation Methods
- Leave one out Cross Validation (LOOC)
- K-Fold Cross Validation
- Stratified K-Fold Cross Validation
- Material: Cross-Validation.pdf
3. Feature Scaling
- Normalization
- Standardization
- Material: Feature-Scaling.pdf
4. Overfitting
- What is overfitting
- Overfitting examples
5. Outlier Analysis
- Outlier examples
- Univariate Outlier Detection
- Multivariate Outlier Detection
- Material: Outlier-Analysis.pdf
6. Missing Data Handling
- Missing at Randomness
- Simple Imputation Methods
- Tree-based Imputation Methods
- Model-based Imputation Methods
- Material: Missing-Data-Handling.pdf
7. Classification Algorithms Basics
- K-NN Algorithm
- Coding K-NN in Python
8. Classification Evaluation
- Confusion Matrix
- Precision, recall
- Accuracy vs F1-score
Mostly contains Clustering, Tree-Based, and Regression basics.
1. Clustering Algorithms Basics
- K-Means Algorithm
- Pros and Cons of K-Means
- A Basic K-Means Example
- Material: KMeans.pdf
2. Tree-Based Algorithm Basics: Decision tree, advantages, disadvantages
- Decision Tree Definition
- Information Gain with Example
- Gini Impurity Calculation
- Material: Decision-Trees.pdf
3. Regression Basics
- Simple Linear Regression
- Multinomial Linear Regression
- Gradient Descent Algorithm
Mostly contains Advanced Tree-Based Algorithms, Logistic Regression, and Neural Networks.
1. Advanced Regression
- Logistic Regression
- Sigmoid Function
2. Advanced Trees
- Regression Trees
- Boosting, Bootstrapping and Aggregation
- Ensemble Learning Example: Random Forest
3. Introduction to Deep Learning
- Perceptron Definition
- Neural Networks, layers, weights, and bias
- Forward & Back Propagation