Use same BM25 k1/b parameters across engines. #45

jpountz · 2023-09-24T13:54:16Z

The k1 and b parameters of BM25 can influence what hits may be dynamically pruned and thus performance numbers, so it would be good to use the same values across engines. Currently it looks like engines use their own defaults, which seem to be k1=0.9 and b=0.4 for PISA, and k1=1.2 and b=0.75 for Lucene and Tantivy.

Currently different engines use different parameters for BM25, e.g. Tantivy and Lucene use (k1=1.2,b=0.75) while PISA uses (k1=0.9,b=0.4). Robertson et al. had initially suggested that 1.2/0.75 would make good defaults for BM25 but Trotman et al. later suggested that 0.9/0.4 would make better defaults and this seems to be the consensus nowadays. The ranking function matters because it affects which hits may be skipped via dynamic pruninng, which in-turn affects search performance. Closes quickwit-oss#45

jpountz · 2023-10-02T09:48:21Z

To get a sense of the influence of these parameters on query performance, I compared Lucene-9.8 with 1.2/0.75 against 0.9/0.4 on the TOP_100 command. I'm getting:

4.6% better latency on average for intersections with 0.9/0.4
4.2% better latency on average for unions with 0.9/0.4

So it's not huge but significant and extremely consistent:

7 queries get better latencies with 1.2/0.75
2 queries get the same latencies
893 queries get a better latency with 0.9/0.4

jpountz linked a pull request Sep 25, 2023 that will close this issue

Recommend scoring hits with BM25(k1=0.9,b=0.4). #46

Open

fulmicoton mentioned this issue Sep 29, 2023

Change const parameter of bm25 quickwit-oss/tantivy#2195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use same BM25 k1/b parameters across engines. #45

Use same BM25 k1/b parameters across engines. #45

jpountz commented Sep 24, 2023

jpountz commented Oct 2, 2023

Use same BM25 k1/b parameters across engines. #45

Use same BM25 k1/b parameters across engines. #45

Comments

jpountz commented Sep 24, 2023

jpountz commented Oct 2, 2023