You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We focus mainly on the indexing of 100M 768-dimensional vectors.
The algorithm can search an arbitrary random subset of the 100M dataset, e.g. the 30M and 10M datasets provided. The average recall should always be around 90 % and above 90 % on the 3 sets provided (100M, 30M, 10M).
Datasets with less or precisely 50M objects are loaded into the main memory during the build and searched there. However, I did not test datasets of the size around 50M.
We do not use any of the provided PCA projections even though our Python code downloads the PCA96 datasets - sorry for that.
In case of need, Jan Sedmidubsky, who made the GHA part with Python, will be available from the 12th of July. Vladimir is online anytime except for the 11th of July.
team
DISA-CRANBERRY
corresponding
[email protected]
tasks
Task A
subsets and projections
100M, 30M, 10M
comments
We focus mainly on the indexing of 100M 768-dimensional vectors.
The algorithm can search an arbitrary random subset of the 100M dataset, e.g. the 30M and 10M datasets provided. The average recall should always be around 90 % and above 90 % on the 3 sets provided (100M, 30M, 10M).
Datasets with less or precisely 50M objects are loaded into the main memory during the build and searched there. However, I did not test datasets of the size around 50M.
We do not use any of the provided PCA projections even though our Python code downloads the PCA96 datasets - sorry for that.
In case of need, Jan Sedmidubsky, who made the GHA part with Python, will be available from the 12th of July. Vladimir is online anytime except for the 11th of July.
members
Vladimir Mic
Jan Sedmidubsky
Pavel Zezula
github link
https://github.com/xsedmid/sisap23-laion-challenge-CRANBERRY/
The text was updated successfully, but these errors were encountered: