-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prompt Depth Anything Model #35401
base: main
Are you sure you want to change the base?
Add Prompt Depth Anything Model #35401
Conversation
@NielsRogge @qubvel @pcuenca Could you help review this PR when you have some time? Thanks so much in advance! Let me know if you have any questions or suggestions. 😊 |
|
||
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() | ||
|
||
# config.backbone = "resnet18" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed them as prompt depth anything only supports dino backbone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your time to review this PR. I have fixed above issues.
Hi @haotongl! Thanks for working on the model integration to transformers 🤗 I'm on holidays until Jan 3rd, and I'll do a review after that if it's still necessary. |
<!-- | ||
<Tip> | ||
|
||
[Prompt Depth Anything V2](prompt_depth_anything_v2) was released in June 2024. It retains the same architecture as the original Prompt Depth Anything, ensuring compatibility with all existing code examples and workflows. However, it utilizes synthetic data and a larger capacity teacher model to deliver more precise and robust depth predictions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Markdown link to prompt_depth_anything_v2 probably won't work. However we can link to (depth_anything_v2): https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2
|
||
*Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containing both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, including 3D reconstruction and generalized robotic grasping.* | ||
|
||
<img src="https://promptda.github.io/assets/teaser.jpg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to open a PR on this repo, specifically this folder: https://huggingface.co/datasets/huggingface/documentation-images/tree/main/transformers/model_doc to add a prompt_depth_anything_architecture.jpg picture
predicted_depth = self.conv1(hidden_states) | ||
predicted_depth = nn.functional.interpolate( | ||
predicted_depth, | ||
(int(patch_height * self.patch_size), int(patch_width * self.patch_size)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the torch_int
helper function from utils instead of int
? This approach means the model can't be traced, and results in the following warning during export:
/usr/local/lib/python3.10/dist-packages/transformers/models/prompt_depth_anything/modeling_prompt_depth_anything.py:352: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
(int(patch_height * self.patch_size), int(patch_width * self.patch_size)),
|
||
if prompt_depth is not None: | ||
# normalize prompt depth | ||
B = len(prompt_depth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
B = len(prompt_depth) | |
B = prompt_depth.shape[0] |
len()
of a tensor causes issues during tracing.
>>> prompt_depth_url = "https://github.com/DepthAnything/PromptDA/blob/main/assets/example_images/arkit_depth.png?raw=true" | ||
>>> prompt_depth = Image.open(requests.get(prompt_depth_url, stream=True).raw) | ||
>>> prompt_depth = torch.tensor((np.asarray(prompt_depth) / 1000.0).astype(np.float32)) | ||
>>> prompt_depth = prompt_depth.unsqueeze(0).unsqueeze(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage-wise, it might be a good idea to create a PromptDepthAnythingProcessor
to help handle processing the input image (via the image processor) and then optional prompt_depth
input.
What does this PR do?
This PR adds the Prompt Depth Anything Model. Prompt Depth Anything builds upon Depth Anything V2 and incorporates metric prompt depth to enable accurate and high-resolution metric depth estimation.
The implementation leverages Modular Transformers. The main file can be found here.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.