Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added video processing section (Unit 7 - Multimodal Based Video Model) #355

Merged
merged 12 commits into from
Dec 4, 2024

Conversation

1kmmk1
Copy link
Contributor

@1kmmk1 1kmmk1 commented Oct 5, 2024

Added multimodal-based-video-models.mdx at video processing section. This document provides an overview of various multimodal video architectures which integrate different kinds of modalities into a unified representation space.

Part of Proposed Outline Revision for Unit 7. Video & Video Processing #348

Who can review? (Initial)

@jungnerd @cjfghk5697 @mreraser and anyone who wants to review!

@1kmmk1 1kmmk1 marked this pull request as ready for review October 5, 2024 05:59
Copy link
Owner

@johko johko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution and sorry for the late review.
It was a really nice read and I feel like I've got a better idea about multimodal video models now ;)

I mainly left formatting suggestions (mostly repetitive) and make sure to add the pictures :)

@@ -0,0 +1,120 @@
# Multimodal Based Video Models[[mutilmodal-based-video-models]]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really nice that you thought about adding anchors, but actually this is not really needed, unless you want to refer back to the chapter within your own file.
In general the hf-doc-builder will create anchors for every headline automatically, so you can actually remove them :)

5. Depth Modality: Represents the 3D spatial information of the video.
6. Sensor Modality: In some applications, videos may include modalities like temperature or biometric data.

/* Modality Overview Image */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there supposed to be an actual image here?


- **Overview**

/* VideoBERT Overview Image */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again the image question ;)


- **Overview**

/* VATT Overview Image */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image? (don't want to annoy you, just make sure you don't miss these :) )


- **Overview**

/* ImageBind Overview Image */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image ;)

Copy link
Contributor Author

@1kmmk1 1kmmk1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @johko ! First of all, sorry for the late reply. I really appreciate your detailed review! I committed most of your suggestions, removed the anchor and added images.

Copy link
Owner

@johko johko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes 🙂
Now it looks good to me 👍

@ATaylorAerospace ATaylorAerospace self-requested a review November 14, 2024 10:48
Copy link
Collaborator

@ATaylorAerospace ATaylorAerospace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great additions..LGTM!

@ATaylorAerospace ATaylorAerospace merged commit d7e894a into johko:stage Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants