Implement order preserving transform for inner product search #25

dylanrb123 · 2023-09-29T17:03:05Z

As described in this paper this is a mechanism to reduce the maximum inner-product search problem into a nearest-neighbors search problem by adding an extra dimension to each vector which preserves the triangle inequality.

Per the paper:

The triangle inequality does not hold between vectors x,
yi, and yj when an inner product compares them, as is the
case in MIP. Many efficient search data structures rely on
the triangle inequality, and if MIP can be transformed to
NN with its Euclidian distance, these data structures would
immediately become applicable. Our first theorem states
that MIP can be reduced to NN by having an Euclidian
metric in one more dimension than the original problem.

This extra dimension is equal to sqrt(max_norm**2 - norm(x)**2) for each vector x in the dataset, where max_norm is the maximum norm of all vectors in the dataset.

One thing to note on the implementation in this library, since Voyager is not exclusively meant to be used in batch and allows adding new items after the index was initially built, there is no way of knowing what the maximum norm for the dataset will be since the dataset is unknown at build time. As such, we simply calculate the extra dimension based on the data that we have seen so far. This means that if you add a new vector with a larger norm than anything seen so far, the accuracy of the index will suffer. This is similar to the approach taken by Vespa, see their blog post on the matter here. If you have a priori knowledge of your dataset it is recommended that you insert the item with the largest norm first.

Addresses #19

cpp/TypedIndex.h

dylanrb123 and others added 3 commits September 27, 2023 13:53

WIP implement order preserving transform

abf91c8

Get tests working.

676d555

Add test for new inner product accuracy.

53d163f

dylanrb123 marked this pull request as ready for review October 4, 2023 02:45

dylanrb123 and others added 7 commits October 3, 2023 22:59

try setting specific numpy version, use default_rng instead of seed

92c1406

downgrade to latest available for python 3.8

6046e12

add note on inner product implementation, fix np usage in test

7af1d38

Merge branch 'main' into order-preserving-transform

5597798

Fix ASAN violations.

9b6f1e2

WIP pull in metadata changes

3b02b09

remove todo

73fe4be

dylanrb123 commented Oct 4, 2023

View reviewed changes

cpp/TypedIndex.h Outdated Show resolved Hide resolved

dylanrb123 and others added 4 commits October 4, 2023 16:55

unpin numpy

cfa3423

WIP

347cc19

Fix order-preserving transform loading and saving.

679ebcf

clang-format

b2f79c4

dylanrb123 requested a review from psobot October 5, 2023 21:27

psobot approved these changes Oct 6, 2023

View reviewed changes

dylanrb123 merged commit ef07b9a into main Oct 6, 2023
52 checks passed

dylanrb123 deleted the order-preserving-transform branch October 6, 2023 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement order preserving transform for inner product search #25

Implement order preserving transform for inner product search #25

dylanrb123 commented Sep 29, 2023 •

edited

Loading

Implement order preserving transform for inner product search #25

Implement order preserving transform for inner product search #25

Conversation

dylanrb123 commented Sep 29, 2023 • edited Loading

dylanrb123 commented Sep 29, 2023 •

edited

Loading