DataAnalysisECommerce

This project aims to explore an e-commerce dataset containing information about visitors' activities during a seven-day period when an A/B test was conducted. Each row of data represents a session, and includes the following information:

Outcomes of interest

Clicks: binary
ATC (add-to-cart): binary
CVR (conversion): binary
SessionRevenue: integer

Visitors' information:

SessionStartDate: date
platform: categorical (mobile and desktop)
visitorType: categorial (new, activated, returning, acquired)
testGroup: categorical (control and variation)
CategoryID (product types): categorical (TV Stands, sofas, area rugs, etc.)

Part 1: Analyze whether the variation should be rolled out

Please refer to analysis.py

Among mobile users, there's statistically significant positive lift for Session Revenue (1.61%), CVR (1.81%) and ATC (0.61%)

Analyzing results over time cumulatively:

Given the short duration of the test, CVR performs the best with biggest lift at the end of the test period, most stable lift over time, and increasing t-stat over time
Session revenue’s lift over time and t-stat over time both have U shape, indicating that customers need more time to adjust spending behavior
ATC’s lift over time and t-stat over time are not stable

Part 2: Fit a number of machine learning models to understand what factors in the dataset can best predict conversion rate

Please refer to ml.py

From the limited available features (platform, visitor type, category), we can predict conversion rate with AUC score of 0.713
Under sampling to ensure balanced conversion outcomes in the sample improves the AUC score from 0.659 to 0.713
Best predictive model is random forest; however, it does not outperform other models such as logistic regression by a large margin

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
images		images
results		results
README.md		README.md
analysis.py		analysis.py
categorical_encoder.py		categorical_encoder.py
ml.py		ml.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataAnalysisECommerce

Part 1: Analyze whether the variation should be rolled out

Part 2: Fit a number of machine learning models to understand what factors in the dataset can best predict conversion rate

About

Releases

Packages

Languages

janegle/DataAnalysisECommerce

Folders and files

Latest commit

History

Repository files navigation

DataAnalysisECommerce

Part 1: Analyze whether the variation should be rolled out

Part 2: Fit a number of machine learning models to understand what factors in the dataset can best predict conversion rate

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages