Skip to content

Why nearest k = 5 sometimes only return 2 results? #3306

Answered by westonpace
hongbo-miao asked this question in Q&A
Discussion options

You must be logged in to vote

In lance the default is "postfiltering". This is different from lancedb where the default is "prefiltering". Both are capable of returning fewer results than asked for but it is more common with postfiltering.

With post-filtering we first perform the vector search to calculate k * refine_factor results. We then filter these and rank the remaining results. Since you are not setting refine_factor it is defaulting to None which means you will get fewer than k results if the filter eliminates any of the top k.

To get prefiltering add prefilter=True to your to_table call.

With prefiltering we first calculate which row ids match the filter. We then perform a vector search and filter the results…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@hongbo-miao
Comment options

Answer selected by hongbo-miao
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants