Add a batch iterator for the Vamana Indexes #64
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This implements the groundwork for a batch iterator for the low-level Vamana indexes. Conceptually, the batch iterator allows searches to be restarted, returning a new batch of
k
nearest neighbors that have not yet been yielded.The batcher iterator provides C++ iterator interfaces
begin()
andend()
over a buffer of new IDs. To yield new neighbors, search is effectively restarted with the search window size and search buffer capacity incremented byk
, ensuring at leastk
new IDs can be obtained that were not part of previous searches. Filtering is performed post-search to ensure that unique ID's are available on the next calls tobegin()
andend()
.There are some low-hanging performance wins to be had.
next()
must refix the query and allocate scratchspace as needed. The scratchspace is currently lacking an API to enable adapting to new search parameters. Once this is in place, caching should be straightforward to implement.Some other thoughts
Single search for the dynamic index is awkward. Single search uses the provided scratchspace and result are extracted from this scratchspace post-search. However, the dynamic index uses different internal and external IDs with different bit-widths. This makes it impossible to reuse the search buffer to store translated IDs. Currently, the iterator needs to detect if ID translation is required and perform translation manually. Options are:
Of these, I like 3 the most as it has nice symmetry with our existing batch search functions.
Remaining Tasks