Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug for bit sizes being a power of two (and maybe other values) #19

Open
qjerome opened this issue Mar 12, 2024 · 0 comments
Open

Bug for bit sizes being a power of two (and maybe other values) #19

qjerome opened this issue Mar 12, 2024 · 0 comments

Comments

@qjerome
Copy link

qjerome commented Mar 12, 2024

Implementing the following test wouldn't make the test passing anymore.

--- FAIL: TestBugFalsePositives (1.24s)
    bloom_test.go:330: False positive probability is too high at  20.18335038131724 % vs  0.602097140729787 %
FAIL
exit status 1
FAIL	github.com/DCSO/bloom	1.763s
func TestBugFalsePositives(t *testing.T) {
	// this capacity + p would produce a power of 2 bit size
	capacity := uint64(109397)
	p := float64(0.01)
	fillingFactor := 0.9
	N := uint64(float64(capacity) * fillingFactor)
	filter, _ := GenerateExampleFilter(capacity, p, N)
	pAcceptable := math.Pow(1-math.Exp(-float64(filter.k)*float64(N)/float64(filter.m)), float64(filter.k))
	fingerprint := make([]uint64, filter.k)
	cnt := 0.0
	matches := 0.0
	for {
		cnt++
		value := GenerateTestValue(100)
		filter.Fingerprint(value, fingerprint)
		if filter.CheckFingerprint(fingerprint) {
			matches++
		}
		if cnt > float64(capacity)*10 {
			break
		}
	}
	//this might still fail sometimes...
	//we allow for a probability that is two times higher than the normally acceptable probability
	if matches/cnt > pAcceptable*2 {
		t.Error("False positive probability is too high at ", matches/cnt*100, "% vs ", pAcceptable*100, "%")
	}
}

After analysis, it is possible there is a flaw in the fingerprint generation.
Here is my theory (not verified mathematically). Your algorithm generates fingerprints with:

	for i := uint64(0); i < s.k; i++ {
		hn = (hn * g) % m
		fingerprint[i] = uint64(hn % s.m)

However the value of m used (i.e. 0xffffffffffffffc5) is so big that what the code does is equivalent for any x = (hn * g); x < m (which is very likely as we are working with uint64) to the following

	for i := uint64(0); i < s.k; i++ {
		hn = hn * g
		fingerprint[i] = uint64(hn % s.m)

So under those conditions hn is always a multiple of h0 (fnv hash) multiplied by g power i, very likely creating lots of collisions for very specific cases, such as this one.

NB: I did not verify if other bit sizes create the same behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant