You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The k1 and b parameters of BM25 can influence what hits may be dynamically pruned and thus performance numbers, so it would be good to use the same values across engines. Currently it looks like engines use their own defaults, which seem to be k1=0.9 and b=0.4 for PISA, and k1=1.2 and b=0.75 for Lucene and Tantivy.
The text was updated successfully, but these errors were encountered:
jpountz
added a commit
to jpountz/search-benchmark-game
that referenced
this issue
Sep 25, 2023
Currently different engines use different parameters for BM25, e.g. Tantivy and
Lucene use (k1=1.2,b=0.75) while PISA uses (k1=0.9,b=0.4). Robertson et al. had
initially suggested that 1.2/0.75 would make good defaults for BM25 but Trotman
et al. later suggested that 0.9/0.4 would make better defaults and this seems
to be the consensus nowadays.
The ranking function matters because it affects which hits may be skipped via
dynamic pruninng, which in-turn affects search performance.
Closesquickwit-oss#45
To get a sense of the influence of these parameters on query performance, I compared Lucene-9.8 with 1.2/0.75 against 0.9/0.4 on the TOP_100 command. I'm getting:
4.6% better latency on average for intersections with 0.9/0.4
4.2% better latency on average for unions with 0.9/0.4
So it's not huge but significant and extremely consistent:
The k1 and b parameters of BM25 can influence what hits may be dynamically pruned and thus performance numbers, so it would be good to use the same values across engines. Currently it looks like engines use their own defaults, which seem to be k1=0.9 and b=0.4 for PISA, and k1=1.2 and b=0.75 for Lucene and Tantivy.
The text was updated successfully, but these errors were encountered: