Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement Sliding Window Bloom Filter #715

Merged
merged 1 commit into from
Jan 11, 2025
Merged

Conversation

nesty92
Copy link
Contributor

@nesty92 nesty92 commented Jan 6, 2025

Sliding Window Bloom Filter

How It Works

Core Concept

  • Contains two Bloom filters: current and next
  • Elements are added to both filters.
  • At the half of the window the filters are rotated current is dropped, next becomes current
    this is done in Redis to warranty an atomic operation.

Redis implementation.

  1. Key structure:

    • '{name}' is the current filter.
    • '{name:c}' is the current counter.
    • '{name}:n' is the next filter
    • '{name}:nc' is the next counter.
    • '{name}:lr' is the last rotation lock to avoid multiples rotations in a given period
  2. Scripts:

    • Add: Sets bits in the current and next filters.
    • Exists: Checks just bits in the current filter.
    • Rotate: swaps the next filter with the current one, creating a new next filter.
  3. Rotation Coordination:

    • Clients track rotation timing locally and just trigger the rotation that will only be executed if the last rotation lock is not present

@nesty92
Copy link
Contributor Author

nesty92 commented Jan 6, 2025

Hi @rueian, I was working on this and maybe is a good plus to the lib. Any feedback is more than welcome

I had to borrow some of the internal utilities for this module, but I'm not sure if that's the best way to do it.

I made some improvements for the indexes that could also be applied to existing filters, it won't improve that much the execution time because the slowest part is waiting for Redis, but it will reduce the GC pressure when the filter is on a "hot path".

@rueian
Copy link
Collaborator

rueian commented Jan 6, 2025

Hi @nesty92, thank you. I will look into it later. @proost would you like to take a look as well?

rueidisprob/README.md Outdated Show resolved Hide resolved
rueidisprob/slidingbloomfilter_retry.go Outdated Show resolved Hide resolved
@nesty92 nesty92 force-pushed the main branch 5 times, most recently from def7e26 to 04268ac Compare January 6, 2025 16:55
Copy link
Contributor

@proost proost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever Idea!

rueidisprob/slidingbloomfilter.go Outdated Show resolved Hide resolved
@nesty92 nesty92 force-pushed the main branch 2 times, most recently from 0014c39 to a976b89 Compare January 7, 2025 12:41
Copy link
Contributor

@proost proost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I just wandering:

there are partitioned and discrete bloom filter idea for unbounded input. your choice is overlapping and sliding window bloom filter. do you think partitioned and discrete bloom filter is too expensive?

@nesty92
Copy link
Contributor Author

nesty92 commented Jan 8, 2025

Hi @rueian @proost, Following the idea of moving the rotation to redis, i found that some times we get a weird behavior with the expirations.

In the test case the PX is 500 but when running it on multiples test Redis says that the pttl is ~6s, see the video

Cursor_tIzIte8ahI.mp4
1736337047.130384 [0 lua] "SET" "{test}:lr" "1736337047130" "PX" "500" "NX"
1736337047.633866 [0 lua] "SET" "{test}:lr" "1736337047633" "PX" "500" "NX"
1736337048.134485 [0 lua] "SET" "{test}:lr" "1736337048134" "PX" "500" "NX"
1736337048.145867 [0 lua] "SET" "{test}:lr" "1736337048145" "PX" "500" "NX"
1736337054.405496 [0 lua] "SET" "{test}:lr" "1736337054405" "PX" "500" "NX"
1736337059.410190 [0 lua] "SET" "{test}:lr" "1736337059410" "PX" "500" "NX"
1736337057.153281 [0 lua] "SET" "{test}:lr" "1736337057153" "PX" "500" "NX"
1736337057.657655 [0 lua] "SET" "{test}:lr" "1736337057657" "PX" "500" "NX"
1736337058.161746 [0 lua] "SET" "{test}:lr" "1736337058161" "PX" "500" "NX"
1736337058.171059 [0 lua] "SET" "{test}:lr" "1736337058170" "PX" "500" "NX"
1736337064.421463 [0 lua] "SET" "{test}:lr" "1736337064421" "PX" "500" "NX"
1736337069.431727 [0 lua] "SET" "{test}:lr" "1736337069431" "PX" "500" "NX"
1736337069.448439 [0 lua] "SET" "{test}:lr" "1736337069448" "PX" "500" "NX"
1736337074.444494 [0 lua] "SET" "{test}:lr" "1736337074444" "PX" "500" "NX"
1736337072.370417 [0 lua] "SET" "{test}:lr" "1736337072370" "PX" "500" "NX"
1736337072.870308 [0 lua] "SET" "{test}:lr" "1736337072870" "PX" "500" "NX"
1736337072.872326 [0 lua] "SET" "{test}:lr" "1736337072872" "PX" "500" "NX"
1736337079.452635 [0 lua] "SET" "{test}:lr" "1736337079452" "PX" "500" "NX"
1736337079.467039 [0 lua] "SET" "{test}:lr" "1736337079466" "PX" "500" "NX"
1736337084.460596 [0 lua] "SET" "{test}:lr" "1736337084460" "PX" "500" "NX"
1736337082.378464 [0 lua] "SET" "{test}:lr" "1736337082378" "PX" "500" "NX"
1736337082.878354 [0 lua] "SET" "{test}:lr" "1736337082878" "PX" "500" "NX"
1736337082.881565 [0 lua] "SET" "{test}:lr" "1736337082881" "PX" "500" "NX"
1736337089.463478 [0 lua] "SET" "{test}:lr" "1736337089463" "PX" "500" "NX"
1736337089.477189 [0 lua] "SET" "{test}:lr" "1736337089477" "PX" "500" "NX"
1736337094.470253 [0 lua] "SET" "{test}:lr" "1736337094470" "PX" "500" "NX"
1736337092.386336 [0 lua] "SET" "{test}:lr" "1736337092386" "PX" "500" "NX"
1736337092.886976 [0 lua] "SET" "{test}:lr" "1736337092886" "PX" "500" "NX"
1736337092.887048 [0 lua] "SET" "{test}:lr" "1736337092887" "PX" "500" "NX"
1736337099.481321 [0 lua] "SET" "{test}:lr" "1736337099481" "PX" "500" "NX"

Do you have any idea why this happens?

After running it several's times I can't reproduce it now but from time to time I saw ~1s of pttl.

@nesty92
Copy link
Contributor Author

nesty92 commented Jan 8, 2025

LGTM.

I just wandering:

there are partitioned and discrete bloom filter idea for unbounded input. your choice is overlapping and sliding window bloom filter. do you think partitioned and discrete bloom filter is too expensive?

@proost The first implementation that I did was a partitioned filter but partitioned filters can be slower if you need to query multiple partitions to get a result. Then i found this idea of overlapping that is simpler and you only have check the current filter

Copy link
Contributor

@proost proost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that is very weird operation. I read code again, but i don't know how can happen.

rueidisprob/slidingbloomfilter.go Outdated Show resolved Hide resolved
Copy link
Contributor

@proost proost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that code is changed. is weird pttl phenomenon gone? i guess same situation happen.

rueidisprob/slidingbloomfilter.go Outdated Show resolved Hide resolved
@nesty92
Copy link
Contributor Author

nesty92 commented Jan 8, 2025

I found that code is changed. is weird pttl phenomenon gone? i guess same situation happen.

@proost, It continue from time to time, even try with redis-benchmark to discard any possible error from our side and still, I'll prepare a issue on Redis to get more info about it

This feature enhances the existing Bloom Filter capabilities
by allowing time-based item tracking, making it suitable for
use cases requiring temporary membership checks.

Signed-off-by: Ernesto Alejandro Santana Hidalgo <[email protected]>
@nesty92
Copy link
Contributor Author

nesty92 commented Jan 8, 2025

issue on Redis to get more info about it

redis/redis#13733

@rueian
Copy link
Collaborator

rueian commented Jan 10, 2025

Do you have any idea why this happens?

Hi @nesty92, I can't reproduce the issue with valkey 8. These are the commands I used:

watch -n0 ./valkey-cli PTTL a
watch -n0 ./valkey-cli SET a 1 PX 500 NX

Which version of Redis are you using? But, anyway, I think we can merge this already.

@nesty92
Copy link
Contributor Author

nesty92 commented Jan 11, 2025

Hi @rueian , I was using 7.4 that is the same in the cluster used for testing. Not sure yet why this behavior but there's nothing to do from our side.

I will try with valkey later

if acquiredLock then
redis.call('RENAME', nextFilterKey, filterKey)
redis.call('RENAME', nextCounterKey, counterKey)
redis.call('SET', nextFilterKey, "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the script always rotate the filters once if it acquires the lock. But shouldn’t it rotate the filters twice if we access it after idling for a while (> 2*windowHalf)?

Copy link
Contributor Author

@nesty92 nesty92 Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely correct. If more than half of the window has passed since the last rotation, it means that both the current and next filters are no longer valid since they no longer contain relevant data. In this case, saving the last rotation and reseting the filter if to much time as passed will solve it

@rueian rueian merged commit 09ad427 into redis:main Jan 11, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants