Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Distances #100

Open
ashvardanian opened this issue Mar 20, 2024 · 4 comments
Open

Sparse Distances #100

ashvardanian opened this issue Mar 20, 2024 · 4 comments
Assignees

Comments

@ashvardanian
Copy link
Owner

All existing metrics imply dense vector representations. Dealing with very high-dimensional vectors, sparse representations may provide huge space-efficiency gains.

The only operation that needs to be implemented for Jaccard, Hamming, Inner Product, L2, and Cosine is a float-weighted vectorized set-intersection. We may expect the following kinds of vectors:

  • u16 - high priority
  • u32 - high priority
  • u16f16 - medium priority
  • u32f16 - medium priority
  • u32f32 - low priority?

The last may not be practically useful. AVX-512 backend (Intel Ice Lake and newer and AMD Genoa) and SVE (AWS Graviton, Nvidia Grace, Microsoft Cobalt) will see the biggest gains. Together with a serial backend, multiplied by 4-5 input types, and 5 distance functions, this may result in over 100 new kernels.

Any thoughts and recommendations? Someone else looking for this functionality?

@ogencoglu
Copy link
Contributor

Thanks for this library!

Any updates to sparsity support? Similar to scipy.csr_matrix etc. for example.

@ashvardanian
Copy link
Owner Author

Yes, @ogencoglu, sparsity is already implemented, as in numpy.intersect1d. What kind of functionality are you looking for?

@ogencoglu
Copy link
Contributor

In numpy/scipy, A @ B is much faster if A is sparse and turned into scipy sparse matrix. If both are sparse or only matrix A is sparse and B is dense. Both works.

Can SimSIMD improve such matrix multiplications. That's my use case.

@ashvardanian
Copy link
Owner Author

Cool! We will have a few related releases, but more likely in October/November. Can you please open a separate feature request for Sparse Matrix Multiplications?

And, as always, it helps if you can spread the word about the library - helps us prioritize features and work between different projects, @ogencoglu 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants