Handling huge embeddings databases. #261
-
Hey, While this is not something strictly related to Human, it is related on how to manage Human 1024 elements embedding arrays. Most of the other facereg libraries I've used deals with 128 dimensions for their embedding however human uses 1024 arrays, this does complicate things a little bit. For now I've just used a 256GB ram server to store all the data in memory and handle face matching, however the dataset is currently too large to handle it on RAM and i'm now looking for ways to use a DB for this approach. I was thinking to use postgresql since it does have native support for array but 1024 dimensions is too much for it to handle, another options that comes to mind is MongoDB. Does anyone have any experience dealing with this? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I've used
You could reduce number of computed dimensions. Matching algorithm cares that both source and target descriptor have same number of dimensions, but they can be anything. Reducing dimensions does decrease precision, but you can play with what is acceptable to you (1024 -> 512 -> 256 -> 128). You can also normalize descriptors from |
Beta Was this translation helpful? Give feedback.
-
Thank you, i'll do testings with mongodb and elasticsearch keeping the 1024 dimensions. |
Beta Was this translation helpful? Give feedback.
-
just my $0.02 on MongoDB vs ElasticSearch I love ES - its query performance is great for large datasets and its even resilient enough nowadays to use as primary store (not just as index database as it was originally intended), but it's much harder to setup and maintain and does use a lot more resources when idle. On the other hand, using MongoDB from NodeJS is trivial and it requires very little resources when idle. I'd stick with MongoDB unless you're expecting really extreme number of descriptors. But if your database size is expected to be in multi-GB range, ES starts showing advantages. |
Beta Was this translation helpful? Give feedback.
I've used
MongoDB
in several projects, its pretty trivial and quite fast.You could reduce number of computed dimensions. Matching algorithm cares that both source and target descriptor have same number of dimensions, but they can be anything.
Reducing dimensions does decrease precision, but you can play with what is acceptable to you (1024 -> 512 -> …