Labviva 2024 Summer Project Summary

This project focuses on enhancing Labviva's platform by developing predictive models to estimate shipment and delivery times for life science laboratory products. Accurate delivery predictions are critical for customer confidence and decision-making, allowing buyers to compare products based on both availability and delivery timelines.

Objectives

Develop predictive models to estimate shipping times using internal data sources
Establish a robust data pipeline to extract and prepare the necessary data for modeling
Conduct exploratory data analysis (EDA) to identify meaningful patterns and correlations.
Build and evaluate baseline models to guide future advanced modeling efforts

Approach

Data Integration:

Data was sourced from three internal tables describing different data elements
Complex cross-referencing using key identifiers was used to link records across tables
Result: a unified dataset containing essential features and relevant metadata

Data Cleaning and Feature Engineering:

Cleaned invalid entries (e.g., negative shipping times, missing values)
Created a feature to serve as the primary target variable describing an element of delivery time
Extracted additional features to capture potentially important seasonal trends. Considered cyclic transformations to better represent time-based patterns.

Exploratory Data Analysis (EDA):

Investigated potential correlations between shipping time and various features
Visualized relationships to identify trends and outliers, noting limitations due to the small dataset size
Highlighted potential biases in the dataset

Model Development:

Split the data into training and testing sets using a 67/33 ratio
Built baseline models:
- Linear Regression: Explored linear relationships between features and shipping time
- Random Forest Regressor: Captured potential non-linear relationships in the data
Observed high mean squared errors (MSE) due to the limited dataset, with values exceeding acceptable thresholds for practical applications

Results

Despite promising initial correlations, the dataset's small size limited the reliability of the results
Early models provided a baseline for comparison but highlighted the need for more comprehensive data to improve accuracy

Challenges

Dataset Size
Data Integration
Limited Feature Representation

Recommendations for Future Work

Data Expansion:
- Implement data integrity checks at acquisition to prevent issues like invalid dates and missing values.
- Capture additional attributes to enhance feature richness.
Feature Engineering:
- Convert date-based features into cyclic representations for improved seasonal trend modeling
- Introduce engineered features
Advanced Modeling:
- Revisit EDA and model baselines with expanded data
- Explore more sophisticated models, such as gradient boosting or neural networks, to capture complex relationships
- Establish separate test sets for hyperparameter tuning and unbiased evaluation.

Technical Tools Used

Programming Language: Python
Libraries Used:
- pandas and numpy for data cleaning and manipulation
- sklearn for train-test splitting and baseline model development
Visualization libraries for EDA (matplotlib)
Data Management: Dataset preparation and integration were performed in a Jupyter Notebook

Future Impact

Once refined, these models will enable Labviva to provide accurate shipping estimates, improving customer trust and streamlining supplier management.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Labviva 2024 Summer Project Summary

Objectives

Approach

Data Integration:

Data Cleaning and Feature Engineering:

Exploratory Data Analysis (EDA):

Model Development:

Results

Challenges

Recommendations for Future Work

Technical Tools Used

Future Impact

About

rtsej/LViva-summer-internship-project

Folders and files

Latest commit

History

Repository files navigation

Labviva 2024 Summer Project Summary

Objectives

Approach

Data Integration:

Data Cleaning and Feature Engineering:

Exploratory Data Analysis (EDA):

Model Development:

Results

Challenges

Recommendations for Future Work

Technical Tools Used

Future Impact

About

Resources

Stars

Watchers

Forks