You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The similarity between embeddings of text, video, audio, etc are not high, usually around 0.1 - 0.3, how do we know how relevant the embeddings are to each other? Can this encoder be trusted for downstream tasks such as semantic search in video? If so, what is the appropriate way to use these embeddings?
The text was updated successfully, but these errors were encountered:
The similarity between embeddings of text, video, audio, etc are not high, usually around 0.1 - 0.3, how do we know how relevant the embeddings are to each other? Can this encoder be trusted for downstream tasks such as semantic search in video? If so, what is the appropriate way to use these embeddings?
The text was updated successfully, but these errors were encountered: