Every year, millions of people fall victim to fraud that costs the global economy billions of dollars. If you're a victim, it can wreak havoc on your personal finances. Luckily, due to some modern fraud detection techniques many financial institutions have measures in place to help protect you from credit fraud.
Dataset is from below URL
https://www.kaggle.com/mlg-ulb/creditcardfraud
Fraud Detection is a technique used to identify unusual patterns that are different from the rest of the population and not behaving as expected. These unusual patterns are also called as outliers.
The fraud detection involves in-depth data analysis/data-mining to recognize the unusual patterns. In this dataset, most of the data analysis part is already done and most of the features are scaled. The names of the features are not shown due to privacy reasons.
Hence our main focus will be to balance the data and perform predective analysis.
The Credit Card Fraud Detection dataset contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
Goal here is to identify as much fraudulent credit card transactions as possible. And as mentioned in the dataset insperation, I will calculate the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.
- Import Libraries
- Read Data
- Understand the data
- Exploratory Data Analysis
- Label Data
- Cluster data using Dimensionality reduction
- Split into train and test sets
- Scaling
- Predictive Analysis on unbalanced data
- Validate Unbalanced Data
- Balance Data using oversampling method
- Predictive Analysis on Balanced Data
- Validate Balanced Data
- Feature Importance
- Conclusion