Subscribe us: https://groups.google.com/u/2/g/bodymaps
ScaleMAI is an AI-integrated data curation and annotation agent that combines iterative, multi-stage processes with AI and human expertise to progressively enhance dataset quality.
ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models
Wenxuan Li, Pedro R. A. S. Bassi, Tianyu Lin, Yu-Cheng Chou, Xinze Zhou, Yucheng Tang, Fabian Isensee, Kang Wang, Qi Chen, Xiaowei Xu, Xiaoxi Chen, Lizhou Wu, Qilong Wu, Yannick Kirchhoff, Maximilian Rokuss, Saikat Roy, Yuxuan Zhao, Dexin Yu, Kai Ding, Constantin Ulrich, Klaus Maier-Hein, Yang Yang, Alan Yuille, Zongwei Zhou*
Johns Hopkins University
dataset | # of class | # of CT | # of center |
---|---|---|---|
TCIA-CBCT [Han et al., Med. Phys. 2021] | 0 | 40 | 1 |
MSD-Pancreas [Antonelli et al., Nat. Commun. 2022] | 2 | 420 | 1 |
TCIA-panNET [Chen et al., Int. J. Cancer 2023] | 0 | 38 | 1 |
PANORAMA [Alves et al., 2024] | 6 | 3,000 | 7 |
PancreaVerse [Li et al., 2025] | 27 | 25,362 | 112 |
PancreaVerse comprises 25,362 CT scans with precise per-voxel annotations of benign and malignant pancreatic tumors, pancreas head, body, and tail, along with 24 surrounding structures (i.e., pancreas, superior mesenteric artery, pancreatic duct, celiac artery, common bile duct, veins, aorta, gall bladder, left and right kidneys, liver, postcava, spleen, stomach, left and right adrenal glands, bladder, colon, duodenum, left and right femurs, left and right lungs, and prostate). Sourced from 112 hospitals, this dataset includes imaging metadata such as patient sex, age, contrast phase, diagnosis, spacing, and scanner details.
Caution
Annotating a dataset of 25K CT scans with 600K 3D tumor and organ masks requires an expert radiologist to start work on it since 1790.
This dataset enables standard medical imaging tasks—detection, segmentation, and classification—and clinical tasks such as tumor staging and radiotherapy planning.
@article{li2025scalemai,
title={ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models},
author={Li, Wenxuan and Bassi, Pedro RAS and Lin, Tianyu and Chou, Yu-Cheng and Zhou, Xinze and Tang, Yucheng and Isensee, Fabian and Wang, Kang and Chen, Qi and Xu, Xiaowei and others},
journal={arXiv preprint arXiv:2501.03410},
year={2025},
url={https://github.com/MrGiovanni/ScaleMAI}
}
@article{li2024abdomenatlas,
title={AbdomenAtlas: A large-scale, detailed-annotated, \& multi-center dataset for efficient transfer learning and open algorithmic benchmarking},
author={Li, Wenxuan and Qu, Chongyu and Chen, Xiaoxi and Bassi, Pedro RAS and Shi, Yijia and Lai, Yuxiang and Yu, Qian and Xue, Huimin and Chen, Yixiong and Lin, Xiaorui and others},
journal={Medical Image Analysis},
pages={103285},
year={2024},
publisher={Elsevier},
url={https://github.com/MrGiovanni/AbdomenAtlas}
}
This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research and the McGovern Foundation. Paper content is covered by patents pending.