-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embeddings Models #159
Embeddings Models #159
Conversation
…earning (loss). Added custom reID/embedding metrics. Implemented a test to verify trainability, exportability and inference. Removed GhostFaceNetsV2 from the backbone tests as it only generates embeddings instead of the usual features.
…locks and variants. Added tests for all supported pytorch metric learning losses.
…s-train into feat/reid-support
…s-train into feat/reid-support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM
On the open questions:
Predefined Model
- I think currently predefined model can use the same setup as it is definened in the embeddings_model.yaml (same loss, miner, distance, reducer, etc.). This was used before already for a successful training and works. But agree, we should also test out others and give some pointers in the docs which combination is better for specific cases - this can be addressed as a separate PR IMO
Visualizer
I like the second idea of generalizing visualizers to same interface as we have for metrics. I agree that longterm this would be the best solution. I wouldn't make this a blocking requirement but we can note internally as a nice to have improvement.
And for the visualization of embedding models for datasets with a lot of classes I think to some degree this is still unsolved problem. There are multiple methods for reducing dimensionality (PCA, t-SNE, UMAP, etc.) and we likely can't catch all as the best one is usually decided case by case. But we could provide several options to choose from. But also nice-to-have, for initial implementation current one is nice.
Overview
Re-opening of #141
ClosestIsPositiveAccuracy
andMedianDistance
metricsEmbeddingVisualizer
"metadata"
labels introduced in #211Example:
The specific name of the metadata field can be overridden in the config file using
metadata_task_override
field.This will cause the model to look for
"task_name/metadata/color"
labels instead of"task_name/metadata/id"
Open questions
Predefined model
There are a lot of options to specify for the embedding models, we support:
It would be good to do a large benchmarking on some real dataset to determine good combinations that we can use in the predefined models
Visualizer
Visualizing the embedding results is challenging. The current implementation uses PCA to reduce the dimensionality to 2d and then plot the points using
seaborn.scatterplot
andseaborn.kdeplot
so we can see whether the points representing different embedding classes are getting grouped together.This works somewhat good for tasks with just a few embedding classes, but is not great for anything more complex
reset
,update
, andcompute
methods which would allow natural batch accumulationdatabase
andquery
imagesComparison of simple embeddings of vehicle images based on color (red, green, blue)
Untrained model on top, trained on the bottom
Dataset with many classes (VeRI dataset)
Untrained model on the right, trained on the left