Dillard’s is a major retail chain with several stores. Their point-of-sales (POS) data over a period of time is available in https://nuwildcat-my.sharepoint.com/:u:/g/personal/dkl524_ads_northwestern_edu/Eae3-Uaey_ZNgKKWhwnZ8dwBngaVoXYR1mqd1iN6AEhAlw The file is over 1GB. There are 5 tables in the schema. You are encouraged to augment the data with other public datasets. The data dictionary is provided in a separate file. Suggested process to undertake:
- Understand the data
- Perform data exploration (number of SKUs, number of items per basket, number of stores, most frequently purchased items, busiest stores, etc)
- Find a machine learning related question to address
- Feature selection and engineering
- Modeling
- Dashboards and story telling
- ROI – make appropriate assumptions
Topic: Analyzing and predicting sales outcomes using multiple Machine Learning Models.
Client: Dillard's, an American department store chain.
Data: 120 million records, totaling over 10GB in size.
Business Question: Predict product returns based on product information and transaction records.
Objective: Optimize inventory management strategies to maximize return on investment.
Models: Linear Regression, Lasso Regression, Random Forest