VLSA: Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

Abstract: Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive architectures and only coarse-grained patient-level labels to learn prognostic visual representations from gigapixel WSIs. Such learning paradigm suffers from important performance bottlenecks, when facing present scarce training data and standard multi-instance learning (MIL) framework in CPATH. To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (VLSA) paradigm. Concretely, (1) VLSA is driven by pathology VL foundation models. It no longer relies on high-capability networks and shows the advantage of data efficiency. (2) In vision-end, VLSA encodes prognostic language prior and then employs it as auxiliary signals to guide the aggregating of prognostic visual features at instance level, thereby compensating for the weak supervision in MIL. Moreover, given the characteristics of SA, we propose i) ordinal survival prompt learning to transform continuous survival labels into textual prompts; and ii) ordinal incidence function as prediction target to make SA compatible with VL-based prediction. Notably, VLSA's predictions can be interpreted intuitively by our Shapley values-based method. The extensive experiments on five datasets confirm the effectiveness of our scheme. Our VLSA could pave a new way for SA in CPATH by offering weakly-supervised MIL an effective means to learn valuable prognostic clues from gigapixel WSIs.

📚 Recent updates:

24/10/07: add the Notebook - VLSA Walkthrough
24/09/24: codes & papers are live
24/09/10: release VLSA

On updating. Stay tuned.

VLSA Walkthrough

Please refer to our Notebook - VLSA Walkthrough. It provides the detail of

individual incidence function prediction in VLSA models;
and prediction interpretation using our Shapley values-based method.

👩‍💻 Running the Code

Pre-requisites

All experiments are run on a machine with

one NVIDIA GeForce RTX 3090 GPU
python 3.8 and pytorch==1.11.0+cu113

Detailed package requirements:

for pip or conda users, full requirements are provided in requirements.txt.
for Docker users, you could use our base Docker image via docker pull yuukilp/deepath:py38-torch1.11.0-cuda11.3-cudnn8-devel and then install additional essential python packages (see requirements.txt) in the container.

Training models

Use the following command to load an experiment configuration and train the VLSA model (5-fold cross-validation):

python3 main.py --config config/IFMLE/tcga_blca/cfg_vlsa_conch.yaml --handler VLSA --multi_run

All important arguments are explained in config/IFMLE/tcga_blca/cfg_vlsa_conch.yaml.

For the traditional SA models only using visual features, use this one:

python3 main.py --config config/IFMLE/tcga_blca/cfg_sa_base_conch.yaml --handler SA --multi_run

Training Logs

We advocate open-source research. Our full training logs for VLSA models can be accessed at Google Drive.

🔥 Awesome Papers of Pathology VLMs

Foundational VLMs for computational pathology:

Model	Architecture	Paper	Code	Data
PLIP (NatMed'23)	CLIP	A visual language foundation model for pathology image analysis using medical twitter	Github	208,414 pathology images paired with natural language descriptions from twitter
Quilt-Net (NeurIPS'23)	CLIP	Quilt-1M: One million image-text pairs for histopathology	Github	802,148 image and text pairs from YouTube
CONCH (NatMed'24)	CoCa	A Vision-Language Foundation Model for Computational Pathology	Github	over 1.17 million image-caption pairs
CPLIP (CVPR'24)	CLIP	CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment	Github	Many-to-many VL alignment on ARCH dataset
PathAlign (arXiv'24)	BLIP-2	PathAlign: A vision-language model for whole slide images in histopathology	-	over 350,000 WSIs and diagnostic text pairs

VLM-driven computational pathology tasks:

Model	Subfield	Paper	Code	Base
TOP (NeurIPS'23)	WSI Classification	The rise of ai language pathologists: Exploring two-level prompt learning for few-shot weakly-supervised whole slide image classification	Github	Few-shot WSI classification
FiVE (CVPR'24)	WSI Classification	Generalizable whole slide image classification with fine-grained visual-semantic interaction	Github	VLM pretraining for WSI classification
ViLa-MIL (CVPR'24)	WSI Classification	Vila-mil: Dual-scale vision language multiple instance learning for whole slide image classification	Github	Dual-scale features for WSI classification
VLSA (arXiv'24)	WSI Survival Analysis	Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology	Github	VLM-driven vision-language survival analysis

NOTE: please open a new PR if you want to add your work into this table.

WSI Preprocessing

Following CONCH, we first divide each WSI into patches of 448 * 448 pixels at 20x magnification. Then we adopt the image encoder of CONCH to extract patch features.

Our complete procedure in WSI preprocessing follows Pipeline-Processing-TCGA-Slides-for-MIL. You could move to it for a detailed tutorial.

Acknowledgements

Some parts of codes in this repo are adapted from the following amazing works. We thank the authors and developers for their selfless contributions.

CONCH: our VLSA is driven by this great pathology VLM.
OrdinalCLIP: adapted for survival prompt learning.
SurvivalEVAL: used for performance evaluation (D-cal and MAE computation).
Patch-GCN: we follow its all data splits in 5-fold cross-validation.

License and Terms of Use

ⓒ UESTC. The models and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the VLSA model and its derivatives is prohibited and requires prior approval. If you are a commercial entity, please contact the corresponding author.

📝 Citation

If you find this work helps your research, please consider citing our paper:

@misc{liu2024interpretablevisionlanguagesurvivalanalysis,
    title={Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology}, 
    author={Pei Liu and Luping Ji and Jiaxiang Gou and Bo Fu and Mao Ye},
    year={2024},
    eprint={2409.09369},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2409.09369}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VLSA: Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

VLSA Walkthrough

👩‍💻 Running the Code

Pre-requisites

Training models

Training Logs

🔥 Awesome Papers of Pathology VLMs

WSI Preprocessing

Acknowledgements

License and Terms of Use

📝 Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

VLSA: Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

VLSA Walkthrough

👩‍💻 Running the Code

Pre-requisites

Training models

Training Logs

🔥 Awesome Papers of Pathology VLMs

WSI Preprocessing

Acknowledgements

License and Terms of Use

📝 Citation