GitHub - ml-lab/sapiens: High-resolution models for human tasks.

Foundation for Human Vision Models

Rawal Khirodkar · Timur Bagautdinov · Julieta Martinez · Su Zhaoen · Austin James
Peter Selednik . Stuart Anderson . Shunsuke Saito

ECCV 2024 (Oral)

Sapiens offers a comprehensive suite for human-centric vision tasks (e.g., 2D pose, part segmentation, depth, normal, etc.). The model family is pretrained on 300 million in-the-wild human images and shows excellent generalization to unconstrained conditions. These models are also designed for extracting high-resolution features, having been natively trained at a 1024 x 1024 image resolution with a 16-pixel patch size.

🚀 Getting Started

Clone the Repository

git clone [email protected]:facebookresearch/sapiens.git
export SAPIENS_ROOT=/path/to/sapiens

Recommended: Lite Installation (Inference-only)

For users setting up their own environment primarily for running existing models in inference mode, we recommend the Sapiens-Lite installation.
This setup offers optimized inference (4x faster) with minimal dependencies (only PyTorch + numpy + cv2).

Full Installation

To replicate our complete training setup, run the provided installation script.
This will create a new conda environment named sapiens and install all necessary dependencies.

cd $SAPIENS_ROOT/_install
./conda.sh

Please download the checkpoints from hugging-face.
You can be selective about only downloading the checkpoints of interest.
Set $SAPIENS_CHECKPOINT_ROOT to be the path to the sapiens_host folder. Checkpoint directory structure:

sapiens_host/
├── detector/
│   └── checkpoints/
│       └── rtmpose/
├── pretrain/
│   └── checkpoints/
│       ├── sapiens_0.3b/
│       ├── sapiens_0.6b/
│       ├── sapiens_1b/
│       └── sapiens_2b/
├── pose/
└── seg/
└── depth/
└── normal/

🌟 Human-Centric Vision Tasks

We finetune sapiens for multiple human-centric vision tasks. Please checkout the list below.

🎯 Easy Steps to Finetuning Sapiens

Finetuning our models is super-easy! Here is a detailed training guide for the following tasks.

[Coming Soon] Pose/Seg/Depth
Surface Normal Estimation

🤝 Acknowledgements & Support & Contributing

We would like to acknowledge the work by OpenMMLab which this project benefits from.
For any questions or issues, please open an issue in the repository.
See contributing and the code of conduct.

License

This project is licensed under LICENSE.
Portions of the project derived from open-source projects are licensed under Apache 2.0.

📚 Citation

If you use Sapiens in your research, please use the following BibTeX entry.

@misc{khirodkar2024_sapiens,
    title={Sapiens: Foundation for Human Vision Models},
    author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
    year={2024},
    eprint={2408.12569},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2408.12569}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
_install		_install
assets		assets
cv		cv
det		det
docs		docs
engine		engine
lite		lite
pose		pose
pretrain		pretrain
seg		seg
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
git_script.sh		git_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foundation for Human Vision Models

ECCV 2024 (Oral)

🚀 Getting Started

Clone the Repository

Recommended: Lite Installation (Inference-only)

Full Installation

🌟 Human-Centric Vision Tasks

Image Encoder ^[lite]

Pose Estimation ^[lite]

Body Part Segmentation ^[lite]

Depth Estimation ^[lite]

Surface Normal Estimation ^[lite]

🎯 Easy Steps to Finetuning Sapiens

[Coming Soon] Pose/Seg/Depth

Surface Normal Estimation

🤝 Acknowledgements & Support & Contributing

License

📚 Citation

About

Releases

Packages

Languages

License

ml-lab/sapiens

Folders and files

Latest commit

History

Repository files navigation

Foundation for Human Vision Models

ECCV 2024 (Oral)

🚀 Getting Started

Clone the Repository

Recommended: Lite Installation (Inference-only)

Full Installation

🌟 Human-Centric Vision Tasks

🎯 Easy Steps to Finetuning Sapiens

[Coming Soon] Pose/Seg/Depth

🤝 Acknowledgements & Support & Contributing

License

📚 Citation

About

Resources

License

Stars

Watchers

Forks

Languages