Skip to content

Latest commit

 

History

History
193 lines (136 loc) · 7.51 KB

README.md

File metadata and controls

193 lines (136 loc) · 7.51 KB

MEDS-torch: Advanced Machine Learning for Electronic Health Records

PyTorch Lightning Config: Hydra Template Python PyPI Hydra Tests Code Quality Contributors Pull Requests License Documentation Status

🚀 Quick Start

Installation

pip install meds-torch

Set up environment variables

# Define data paths
PATHS_KWARGS="paths.data_dir=/CACHED/NESTED/RAGGED/TENSORS/DIR paths.meds_cohort_dir=/PATH/TO/MEDS/DATA/ paths.output_dir=/OUTPUT/RESULTS/DIRECTORY"

# Define task parameters (for supervised learning)
TASK_KWARGS="data.task_name=NAME_OF_TASK data.task_root_dir=/PATH/TO/TASK/LABELS/"

Basic Usage

  1. Train a supervised model (GPU)
meds-torch-train trainer=gpu $PATHS_KWARGS $TASK_KWARGS
  1. Pretrain an autoregressive forecasting model (GPU)
meds-torch-train trainer=gpu $PATHS_KWARGS model=eic_forecasting
  1. Train with a specific experiment configuration
meds-torch-train experiment=experiment.yaml $PATHS_KWARGS $TASK_KWARGS hydra.searchpath=[pkg://meds_torch.configs,/PATH/TO/CUSTOM/CONFIGS]
  1. Override parameters
meds-torch-train trainer.max_epochs=20 data.batch_size=64 $PATHS_KWARGS $TASK_KWARGS
  1. Hyperparameter search
meds-torch-tune trainer=ray callbacks=tune_default hparams_search=ray_tune experiment=triplet_mtr $PATHS_KWARGS $TASK_KWARGS hydra.searchpath=[pkg://meds_torch.configs,/PATH/TO/CUSTOM/CONFIGS/WITH/experiment/triplet_mtr]

Advanced Examples

For detailed examples and tutorials:

  • Check MIMICIV_INDUCTIVE_EXPERIMENTS/README.md for a comprehensive guide to using MEDS-torch with MIMIC-IV data, including data preparation, task extraction, and running experiments with different tokenization and transfer learning methods.
  • See ZERO_SHOT_TUTORIAL/README.md for a rough WIP walkthrough of zero-shot prediction (and please share feedback on improving this! 🙂)

Example Experiment Configuration

Here's a sample experiment.yaml:

# @package _global_

defaults:
  - override /data: pytorch_dataset
  - override /logger: wandb
  - override /model/backbone: triplet_transformer_encoder
  - override /model/input_encoder: triplet_encoder
  - override /model: supervised
  - override /trainer: gpu

tags: [mimiciv, triplet, transformer_encoder]

seed: 0

trainer:
  min_epochs: 1
  max_epochs: 10
  gradient_clip_val: 1.0

data:
  dataloader:
    batch_size: 64
    num_workers: 6
  max_seq_len: 128
  collate_type: triplet
  subsequence_sampling_strategy: to_end

model:
  token_dim: 128
  optimizer:
    lr: 0.001
  backbone:
    n_layers: 2
    nheads: 4
    dropout: 0

logger:
  wandb:
    tags: ${tags}
    group: mimiciv_tokenization

This configuration sets up a supervised learning experiment using a triplet transformer encoder on MIMIC-IV data. Modify this file to suit your specific needs.

🌟 Key Features

  • Flexible ML Pipeline: Utilizes Hydra for dynamic configuration and PyTorch Lightning for scalable training.
  • Advanced Tokenization: Supports multiple strategies for embedding EHR data (Triplet, Text Code, Everything In Code).
  • Supervised Learning: Train models on arbitrary tasks defined in MEDS format data.
  • Transfer Learning: Pretrain models using contrastive learning, forecasting, and other methods, then finetune for specific tasks.
  • Multiple Pretraining Methods: Supports EBCL, OCP, STraTS Value Forecasting, and Autoregressive Observation Forecasting.

🛠 Installation

PyPI

pip install meds-torch

From Source

git clone [email protected]:Oufattole/meds-torch.git
cd meds-torch
pip install -e .

📚 Documentation

For detailed usage instructions, API reference, and examples, visit our documentation.

For a comprehensive demo of our pipeline and to see results from a suite of inductive experiments comparing different tokenization methods and learning approaches, please refer to the MIMICIV_INDUCTIVE_EXPERIMENTS/README.MD file. This document provides detailed scripts and performance metrics.

🧪 Running Experiments

Supervised Learning

bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_supervised.sh $MIMICIV_ROOT_DIR meds-torch

Transfer Learning

# Pretraining
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_multi_window_pretrain.sh $MIMICIV_ROOT_DIR meds-torch [METHOD]
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_ar_pretrain.sh $MIMICIV_ROOT_DIR meds-torch [AR_METHOD]

# Finetuning
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_finetune.sh $MIMICIV_ROOT_DIR meds-torch [METHOD]
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_ar_finetune.sh $MIMICIV_ROOT_DIR meds-torch [AR_METHOD]

Replace [METHOD] with one of the following:

  • ocp (Observation Contrastive Pretraining)
  • ebcl (Event-Based Contrastive Learning)
  • value_forecasting (STraTS Value Forecasting)

Replace [AR_METHOD] with one of the following:

  • eic_forecasting (Everything In Code Forecasting)
  • triplet_forecasting (Triplet Forecasting)

These scripts allow you to run various experiments, including supervised learning, different pretraining methods, and finetuning for both standard and autoregressive models.

📞 Support

For questions, issues, or feature requests, please open an issue on our GitHub repository.


MEDS-torch: Advancing healthcare machine learning through flexible, robust, and scalable sequence modeling tools.