Embeddings Models #159

kozlov721 · 2025-01-25T03:28:54Z

Overview

Re-opening of #141

Added GhostFaceNetV2 network for embeddings learning together with corresponding losses, metrics and visualizer
- ClosestIsPositiveAccuracy and MedianDistance metrics
- Most losses from pytorch-metric-learning
- EmbeddingVisualizer
Integrated the "metadata" labels introduced in #211
Groundwork for type checking of the metadata labels and better automatic inference of correct tasks

Example:

from luxonis_train.enums import Metadata

class Node(BaseNode):
    tasks = [Metadata("id")]

class Loss(BaseLoss):
    supported_tasks = [Metadata("id")]

The specific name of the metadata field can be overridden in the config file using metadata_task_override field.

- name: GhostFaceNetHead
  alias: color-embeddings
  metadata_task_override: color

This will cause the model to look for "task_name/metadata/color" labels instead of "task_name/metadata/id"

Open questions

Predefined model

There are a lot of options to specify for the embedding models, we support:

21 losses
5 distance measures
10 miners
9 reducers
5 regularizers

It would be good to do a large benchmarking on some real dataset to determine good combinations that we can use in the predefined models

Visualizer

Visualizing the embedding results is challenging. The current implementation uses PCA to reduce the dimensionality to 2d and then plot the points using seaborn.scatterplot and seaborn.kdeplot so we can see whether the points representing different embedding classes are getting grouped together.

This works somewhat good for tasks with just a few embedding classes, but is not great for anything more complex

The plots get very convoluted and not really readable
Visualizers are called per batch, but with many embedding classes the batch will usually contain only one example per class
- We can use a little hack to accumulate the batches (the commented code in the PR)
- We could generalize the visualizers so they work like metrics with reset, update, and compute methods which would allow natural batch accumulation
  - This would also allow visualizing closest matches using database and query images
  - Could be eventually useful for other visualizers as well

Comparison of simple embeddings of vehicle images based on color (red, green, blue)

Untrained model on top, trained on the bottom

Dataset with many classes (VeRI dataset)

Untrained model on the right, trained on the left

…earning (loss). Added custom reID/embedding metrics. Implemented a test to verify trainability, exportability and inference. Removed GhostFaceNetsV2 from the backbone tests as it only generates embeddings instead of the usual features.

…locks and variants. Added tests for all supported pytorch metric learning losses.

…om torch

…class

…s-train into feat/reid-support

klemen1999

Generally LGTM

On the open questions:

Predefined Model

I think currently predefined model can use the same setup as it is definened in the embeddings_model.yaml (same loss, miner, distance, reducer, etc.). This was used before already for a successful training and works. But agree, we should also test out others and give some pointers in the docs which combination is better for specific cases - this can be addressed as a separate PR IMO

Visualizer

I like the second idea of generalizing visualizers to same interface as we have for metrics. I agree that longterm this would be the best solution. I wouldn't make this a blocking requirement but we can note internally as a nice to have improvement.

And for the visualization of embedding models for datasets with a lot of classes I think to some degree this is still unsolved problem. There are multiple methods for reducing dimensionality (PCA, t-SNE, UMAP, etc.) and we likely can't catch all as the best one is usually decided case by case. But we could provide several options to choose from. But also nice-to-have, for initial implementation current one is nice.

luxonis_train/attached_modules/visualizers/embeddings_visualizer.py

CaptainTrojan and others added 30 commits December 6, 2024 18:54

Add detailed docstring for GhostFaceNetsV2 backbone class

be4c2d2

fix: update docstring for pairwise_distances function in pml_metrics.py

c5c4f16

Fixed type errors

c360f15

Implemented improvements and suggestions. Separated GFN into class, b…

6eda12a

…locks and variants. Added tests for all supported pytorch metric learning losses.

refactor: update type hint for GhostFaceNetsV2 class to use Tensor fr…

0689935

…om torch

refactor: remove unused unwrap and wrap methods from GhostFaceNetsV2 …

9463997

…class

Merge branch 'main' into feat/reid-support

9d5c418

fix: correct formatting in __all__ list in metrics module

555fe2a

Improved coverage, explicitly set mdformat github version

9fe0b79

Reduced mdformat-gfm version to 0.3.6 to support Python 3.8

b47e79e

Coverage fixes

8e376a0

Merge branch 'feat/reid-support' of https://github.com/luxonis/luxoni…

708691b

…s-train into feat/reid-support

Merge branch 'main' into feat/reid-support

23892f9

fix: return a model copy for the specified GhostFaceNets variant

23e7500

Merge branch 'feat/reid-support' of https://github.com/luxonis/luxoni…

a356093

…s-train into feat/reid-support

initial labels refactor support

df89eef

updated docs

d01816b

updated predefined models

e34e893

updated attached modules

82abeae

small changes

f48622b

updated tests

7c244af

fixed predefined classification

1de6f74

docs

8c32014

fix inspect

8d7685b

Merge branch 'main' into feature/nested-labels

44198cd

fixed tests

fbbbc26

fix debug config

bb5e882

updated perlin

785f2f8

missing doc

c093363

kozlov721 added 15 commits January 23, 2025 00:31

small changes

7b96ab8

fix for rectangular images

b14a76c

renamed

1bce803

type simplification

3c0423e

added cross batch memory

cb970fa

attached modules improvememnt

c368723

metadata task override

2383283

fix automatic inputs

dec365b

cleaned

1f05da0

metadata overriding

6537153

type checking

fb15dff

embedding tests

d680406

fix

f844d19

parametrized tests

1670566

Merge branch 'main' into feature/embeddings-models

fca9f2c

kozlov721 requested a review from a team as a code owner January 25, 2025 03:28

kozlov721 requested review from klemen1999, tersekmatija and conorsim and removed request for a team January 25, 2025 03:28

github-actions bot assigned kozlov721 Jan 25, 2025

github-actions bot added the enhancement New feature or request label Jan 25, 2025

kozlov721 added 2 commits January 25, 2025 01:58

docs

01de24b

Merge branch 'main' into feature/embeddings-models

a008829

github-actions bot added the documentation Improvements or additions to documentation label Jan 25, 2025

klemen1999 approved these changes Jan 28, 2025

View reviewed changes

luxonis_train/attached_modules/visualizers/embeddings_visualizer.py Outdated Show resolved Hide resolved

kozlov721 mentioned this pull request Jan 28, 2025

Task improvements #162

Open

moved colors to luxonis-ml

8bc9dff

kozlov721 merged commit 3b68c9b into main Jan 29, 2025
5 of 6 checks passed

kozlov721 deleted the feature/embeddings-models branch January 29, 2025 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings Models #159

Embeddings Models #159

kozlov721 commented Jan 25, 2025 •

edited

Loading

klemen1999 left a comment

Embeddings Models #159

Embeddings Models #159

Conversation

kozlov721 commented Jan 25, 2025 • edited Loading

Overview

Open questions

Predefined model

Visualizer

klemen1999 left a comment

Choose a reason for hiding this comment

kozlov721 commented Jan 25, 2025 •

edited

Loading