Data Science Analytics Virtual Intern at British Airways Co.

_________________________________________________________

First Task : Web scraping to gain company insights

Scrape and analyze customer review data to uncover findings for British Airways

- Scrape data from the web

i use a website called Skytrax. ( https://www.airlinequality.com/airline-reviews/british-airways )

- Analyse data to uncover some insights

After Reading the data, i perform some univarite analysis like see the distrubtion of Customer Rating

Create stopword list and visualize Wordcloud

Classifying Reviews To 3 catigores ( Positive, Negative and Nutural )

Data Cleaning by remove_punctuation

create new Dataframe from the 2 important columns

random split train and test data with numpy

Create a bag of words with CountVectorizer from sklearn.feature_extraction.text

- Modeling

import RandomForestClassifier & Split target and independent variables

Fit model on data and Make predictions

find accuracy, precision, recall and make classification_report

_______________________________________________________________

Second Task : Predicting customer buying behaviour

Build a predictive model to understand factors that influence buying behaviour

- Explore and prepare the dataset

First, spend some time exploring the dataset. then, Encode values labels for opject variables.

- Modeling

* After preparing data, i do some feature engineering, then i Start with a Dummy Model (np.rand) - Baseline Model with accuarcy 0.49 and F1_score 0.22

* I try a Simple Model (linear) with :

Linear model & calculate score ( accuarcy 0.85 and F1_score 0.0 )

* Also i try a Simple Model with Balanced Dataset ( Upsampling[f1-score 0.24], Downsampling[f1-score 0.24] ).

* Then i try Complex and Explainable Modeles (Tree Based) :

Decision Tree (Original_data[f1-score 0.26], Upsampling[f1-score 0.30], Downsampling[f1-score 0.32] ).
Random Forest (Original_data[f1-score 0.13], Upsampling[f1-score 0.33], Downsampling[f1-score 0.38] ).

* Finally i try Deeper Modeles :

XGBoost (Original_data[f1-score 0.17], Upsampling[f1-score 0.34], Downsampling[f1-score 0.39] ).
CatBoost (Original_data[f1-score 0.11], Upsampling[f1-score 0.35], Downsampling[f1-score 0.40] ).

* The Best Model is : CatBoostClassifier With Downsampling data because the accuracy is higher.

______________________________________________________

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Task1		Task1
Task2		Task2
.gitattributes		.gitattributes
NjynCWzGSaWXQCxSX_British Airways_Qsrs48Zn9f6swX4ip_1687032794772_completion_certificate.pdf		NjynCWzGSaWXQCxSX_British Airways_Qsrs48Zn9f6swX4ip_1687032794772_completion_certificate.pdf
README.md		README.md

Ahmed-G-ElTaher/Data-Science-Analytics-Virtual-Intern-at-British-Airways

Folders and files

Latest commit

History

Repository files navigation

Data Science Analytics Virtual Intern at British Airways Co.

_________________________________________________________

First Task : Web scraping to gain company insights

Scrape and analyze customer review data to uncover findings for British Airways

- Scrape data from the web

i use a website called Skytrax. ( https://www.airlinequality.com/airline-reviews/british-airways )

- Analyse data to uncover some insights

After Reading the data, i perform some univarite analysis like see the distrubtion of Customer Rating

Create stopword list and visualize Wordcloud

Classifying Reviews To 3 catigores ( Positive, Negative and Nutural )

Data Cleaning by remove_punctuation

create new Dataframe from the 2 important columns

random split train and test data with numpy

Create a bag of words with CountVectorizer from sklearn.feature_extraction.text

- Modeling

import RandomForestClassifier & Split target and independent variables

Fit model on data and Make predictions

find accuracy, precision, recall and make classification_report

_______________________________________________________________

Second Task : Predicting customer buying behaviour

Build a predictive model to understand factors that influence buying behaviour

- Explore and prepare the dataset

First, spend some time exploring the dataset. then, Encode values labels for opject variables.

- Modeling

* After preparing data, i do some feature engineering, then i Start with a Dummy Model (np.rand) - Baseline Model with accuarcy 0.49 and F1_score 0.22

* I try a Simple Model (linear) with :

Linear model & calculate score ( accuarcy 0.85 and F1_score 0.0 )

* Also i try a Simple Model with Balanced Dataset ( Upsampling[f1-score 0.24], Downsampling[f1-score 0.24] ).

* Then i try Complex and Explainable Modeles (Tree Based) :

Decision Tree (Original_data[f1-score 0.26], Upsampling[f1-score 0.30], Downsampling[f1-score 0.32] ).

Random Forest (Original_data[f1-score 0.13], Upsampling[f1-score 0.33], Downsampling[f1-score 0.38] ).

* Finally i try Deeper Modeles :

XGBoost (Original_data[f1-score 0.17], Upsampling[f1-score 0.34], Downsampling[f1-score 0.39] ).

CatBoost (Original_data[f1-score 0.11], Upsampling[f1-score 0.35], Downsampling[f1-score 0.40] ).

* The Best Model is : CatBoostClassifier With Downsampling data because the accuracy is higher.

______________________________________________________

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages