Bug for bit sizes being a power of two (and maybe other values) #19

qjerome · 2024-03-12T17:07:52Z

Implementing the following test wouldn't make the test passing anymore.

--- FAIL: TestBugFalsePositives (1.24s)
    bloom_test.go:330: False positive probability is too high at  20.18335038131724 % vs  0.602097140729787 %
FAIL
exit status 1
FAIL	github.com/DCSO/bloom	1.763s

func TestBugFalsePositives(t *testing.T) {
	// this capacity + p would produce a power of 2 bit size
	capacity := uint64(109397)
	p := float64(0.01)
	fillingFactor := 0.9
	N := uint64(float64(capacity) * fillingFactor)
	filter, _ := GenerateExampleFilter(capacity, p, N)
	pAcceptable := math.Pow(1-math.Exp(-float64(filter.k)*float64(N)/float64(filter.m)), float64(filter.k))
	fingerprint := make([]uint64, filter.k)
	cnt := 0.0
	matches := 0.0
	for {
		cnt++
		value := GenerateTestValue(100)
		filter.Fingerprint(value, fingerprint)
		if filter.CheckFingerprint(fingerprint) {
			matches++
		}
		if cnt > float64(capacity)*10 {
			break
		}
	}
	//this might still fail sometimes...
	//we allow for a probability that is two times higher than the normally acceptable probability
	if matches/cnt > pAcceptable*2 {
		t.Error("False positive probability is too high at ", matches/cnt*100, "% vs ", pAcceptable*100, "%")
	}
}

After analysis, it is possible there is a flaw in the fingerprint generation.
Here is my theory (not verified mathematically). Your algorithm generates fingerprints with:

	for i := uint64(0); i < s.k; i++ {
		hn = (hn * g) % m
		fingerprint[i] = uint64(hn % s.m)

However the value of m used (i.e. 0xffffffffffffffc5) is so big that what the code does is equivalent for any x = (hn * g); x < m (which is very likely as we are working with uint64) to the following

	for i := uint64(0); i < s.k; i++ {
		hn = hn * g
		fingerprint[i] = uint64(hn % s.m)

So under those conditions hn is always a multiple of h0 (fnv hash) multiplied by g power i, very likely creating lots of collisions for very specific cases, such as this one.

NB: I did not verify if other bit sizes create the same behavior

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug for bit sizes being a power of two (and maybe other values) #19

Bug for bit sizes being a power of two (and maybe other values) #19

qjerome commented Mar 12, 2024

Bug for bit sizes being a power of two (and maybe other values) #19

Bug for bit sizes being a power of two (and maybe other values) #19

Comments

qjerome commented Mar 12, 2024