Skip to content
View jwelch1123's full-sized avatar
  • New York

Block or report jwelch1123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jwelch1123/README.md

About Me

Data science problem solver bringing an analytical perspective honed in the biotechnology industry. Experience developing personal and company-scale tools and databases to aid the translation of complex multi-source data into actionable results. Expertise in pre- and post-experiment statistics, design of experiments, molecular biology, and process development. Skilled in scientific communication, data presentation, and converting company-wide goals into specific questions and actionable experiments. Inquisitive, analytical, innovative.

Connect with me

LinkedIn

NYC Data Science Academy Project Portfolio

Notable Projects

  1. Fraud Detection in Medicare Claims Data - Article - Presentation - Repository

    • Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge
    • Constructed a model to predict fraudulent Medicare claims with a 78% accuracy and fraudulent providers with an 87% accuracy by cross-validation and tuning of scikit-learn, XGBoost, and CatBoost models.
    • Reduced data requirements from 100+ claims to 10 while maintaining accuracy and specificity by predicting fraudulence on a claims level and then aggregating up to predict provider fraud, reducing time-to-prediction from over a year to under 2 months, allowing multiple opportunities to identify fraudulent providers.
  2. Interactive Database Visualization - Article - Repository

    • Static Badge Static Badge Static Badge Static Badge
    • Built an interactive dashboard in R-Shiny for the management of cell stocks across multiple R&D departments with integrated data visualizations and workflow-specific data capture.
    • Interviewed Managers and Operators across departments to customize the database visualizations and data capture to their specific needs.
  3. Playlistr: - Repository - Web App

    • Static Badge Static Badge Static Badge Static Badge Static Badge
    • Uses the Spotify API to create playlists matching a chosen message, and saves them to your Spotify account.
    • The Dash App provides two interfaces: "pick-and-choose" for manual selection and "auto-solver" for automatic playlist creation. The playlistr.py file can also be run locally but requires Spotify App credentials.
  4. Daylist Album Art Generator: - Repository - Web App

    • Static Badge Static Badge Static Badge Static Badge Static Badge
    • Uses the Spotify and OpenAI API to get the daylist playlist title and description and uses that text to generate album art.
    • The Dash App provides a way for the user to authenticate their Spotify account and view and download the album art.
  5. Wit, Wisdom, and Vector Embeddings - Repository

    • Static Badge Static Badge Static Badge Static Badge Static Badge
    • Applied NLP to visualize the similarity of sayings from Benjamin Franklin's Poor Richard's Almanack. Applied SentenceTransfomers and UMAP to vectorize and reduce the data to a graphable form.
    • The embedded quotes are visualized in a Dash app using Plotly to allow interactions with the graph. Users can enter a new phrase to find the closest match, hover over quotes to view them and their neighbors, or highlight a region and view all quotes in the table below.
    • Note: Due to the size of the SentenceTransformer package, the Dash app cannot be hosted on standard platforms (Heroku), view the repo for download and installation instructions.
  6. Housing Price Analysis with Machine Learning - Article - Repository

    • Static Badge Static Badge Static Badge Static Badge Static Badge
    • Achieved top 33% of scores in Ames Housing dataset Kaggle challenge, with a mean prediction error of 8% by applying multiple linear and tree models.
    • Applied 4 linear and 3 tree-based models, hyperparameter tuning, and feature selection to predict home sale prices.
  7. Web Scraping for Business Analysis - Article - Repository

    • Static Badge Static Badge Static Badge Static Badge
    • Scraped 250,000+ audiobooks from Audible for title and category information using Python Scrapy.
    • Suggested opportunities for growth and expansion by analyzing pricing, length, rating, and language data.

Pinned Loading

  1. wit-and-wisdom wit-and-wisdom Public

    A visualization of quotes from Poor Richard's Almanac using sentence embedding and dimension reduction. Visualized with a Dash App

    Jupyter Notebook

  2. healthcare_fraud_detection healthcare_fraud_detection Public

    Detecting provider healthcare fraud given patient records for each provider.

    Jupyter Notebook 1

  3. ames_housing ames_housing Public

    Analysis and price prediction of the Ames Housing Prices data set.

    Jupyter Notebook

  4. freezer_management freezer_management Public

    Database entry and visualization app for cell pastes

    R 3 1