Skip to content

MrGiovanni/ScaleMAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

ScaleMAI

ScaleMAI is an AI-integrated data curation and annotation agent that combines iterative, multi-stage processes with AI and human expertise to progressively enhance dataset quality.

Paper

ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models
Wenxuan Li, Pedro R. A. S. Bassi, Tianyu Lin, Yu-Cheng Chou, Xinze Zhou, Yucheng Tang, Fabian Isensee, Kang Wang, Qi Chen, Xiaowei Xu, Xiaoxi Chen, Lizhou Wu, Qilong Wu, Yannick Kirchhoff, Maximilian Rokuss, Saikat Roy, Yuxuan Zhao, Dexin Yu, Kai Ding, Constantin Ulrich, Klaus Maier-Hein, Yang Yang, Alan Yuille, Zongwei Zhou*
Johns Hopkins University
YouTube

PancreaVerse: An AI Trusted Dataset for Pancreatic Cancer Studies

logo

AI models, trained on the PancreaVerse dataset, match senior and expert radiologists in tumor detection and surpasses them in tumor classification accuracy.

dataset # of class # of CT # of center
TCIA-CBCT [Han et al., Med. Phys. 2021] 0 40 1
MSD-Pancreas [Antonelli et al., Nat. Commun. 2022] 2 420 1
TCIA-panNET [Chen et al., Int. J. Cancer 2023] 0 38 1
PANORAMA [Alves et al., 2024] 6 3,000 7
PancreaVerse [Li et al., 2025] 27 25,362 112

PancreaVerse comprises 25,362 CT scans with precise per-voxel annotations of benign and malignant pancreatic tumors, pancreas head, body, and tail, along with 24 surrounding structures (i.e., pancreas, superior mesenteric artery, pancreatic duct, celiac artery, common bile duct, veins, aorta, gall bladder, left and right kidneys, liver, postcava, spleen, stomach, left and right adrenal glands, bladder, colon, duodenum, left and right femurs, left and right lungs, and prostate). Sourced from 112 hospitals, this dataset includes imaging metadata such as patient sex, age, contrast phase, diagnosis, spacing, and scanner details.

Caution

Annotating a dataset of 25K CT scans with 600K 3D tumor and organ masks requires an expert radiologist to start work on it since 1790.

This dataset enables standard medical imaging tasks—detection, segmentation, and classification—and clinical tasks such as tumor staging and radiotherapy planning.

Citation

@article{li2025scalemai,
  title={ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models},
  author={Li, Wenxuan and Bassi, Pedro RAS and Lin, Tianyu and Chou, Yu-Cheng and Zhou, Xinze and Tang, Yucheng and Isensee, Fabian and Wang, Kang and Chen, Qi and Xu, Xiaowei and others},
  journal={arXiv preprint arXiv:2501.03410},
  year={2025},
  url={https://github.com/MrGiovanni/ScaleMAI}
}

@article{li2024abdomenatlas,
  title={AbdomenAtlas: A large-scale, detailed-annotated, \& multi-center dataset for efficient transfer learning and open algorithmic benchmarking},
  author={Li, Wenxuan and Qu, Chongyu and Chen, Xiaoxi and Bassi, Pedro RAS and Shi, Yijia and Lai, Yuxiang and Yu, Qian and Xue, Huimin and Chen, Yixiong and Lin, Xiaorui and others},
  journal={Medical Image Analysis},
  pages={103285},
  year={2024},
  publisher={Elsevier},
  url={https://github.com/MrGiovanni/AbdomenAtlas}
}

Acknowledgement

This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research and the McGovern Foundation. Paper content is covered by patents pending.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published