-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding optional arg to BF.INSERT to allow users to check if their bloom filter can reach the desired size #41
Conversation
e2b5632
to
94bdd4c
Compare
…om filter can reach the desired size Signed-off-by: zackcam <[email protected]>
All things considered, I think the current naming and API arguments make sense. The only alternative I thought of is we can add it BF.INFO, which might make sense either way. We don't need a new command, and diverging on info should be OK since it's written to be extensible. |
Would we want to add to both BF.INFO and BF.INSERT? |
6904b88
to
2b79825
Compare
a2e007b
to
96794de
Compare
1ecb04b
to
82fea4a
Compare
src/bloom/command_handler.rs
Outdated
@@ -462,6 +462,7 @@ pub fn bloom_filter_insert(ctx: &Context, input_args: &[ValkeyString]) -> Valkey | |||
true => (None, true), | |||
false => (Some(configs::FIXED_SEED), false), | |||
}; | |||
let mut validate_scale_to = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would prefer using None
here to make this code more Rust like.
let mut validate_scale_to = None;
Later on, you can use this:
validate_scale_to = match input_args[idx].to_string_lossy().parse::<i64>() {
Ok(num) if (BLOOM_CAPACITY_MIN..=BLOOM_CAPACITY_MAX).contains(&num) => Some(num),
And then:
if let Some(validate_scale_to) {
}
…tionality for info Signed-off-by: zackcam <[email protected]>
Thank you for the change @zackcam . Merging it in now |
Overview
This PR is proposing to add an extra optional arg to the BF.INSERT command in order to allow users to know if their creation will be able to scale to what they desire. We will add an optional arg called “VALIDATESCALETO” and validate whether it is possible to achieve X capacity given the bloom object memory usage limit (by default 128MB), fp rate, Tightening, expansion, and scale outs. We also will expand BF.INFO as well to have a new output as well. This output will show what the max capacity of the specified bloom filter is. This field can also be received independently by specifying MAXSCALEDCAPACITY as an arg.
Amending BF.INSERT command
Adding VALIDATESCALETO as an optional arg of BF.INSERT. This argument will allow the user to specify what they think their filter will scale up to. If the capacity that they think it will scale to isn't possible given the specifications provided then we will throw an error. New bf.insert command will look like:
Due to the nature of this optional argument, specifying NONSCALING and ATLEASTCAPACITY isn't allowed as this new argument is only useful for seeing if a filter can scale to the desired capacity.
There are three new error messages associated with this:
Amending BF.INFO command
We will add a new return value to the default result from BF.INFO on a default key, the new output will look like:
There is also a new optional arg (MAXSCALEDCAPACITY) which can be specified so running `BF.INFO key MAXSCALEDCAPACITY' will return xxxx. Where xxxx is the MAXSCALEDCAPACITY
As this information is only useful for scaling filters this option will be hidden non scaling filters this means that if the user has specified NONSCALING in the creation of their filter we will have not have the Max scaled capacity in the return. We also will not recognize MAXSCALEDCAPACITY as a valid optional arg to the info command in such a case as well.
Testing
Integration Tests
Added checks for these three error messages in
test_bloom_command.py
as well as happy cases to show that different expansion, fp rates and capacity all are valid when using this new optional arg.Updated the correctness tests to now check for the new field for a scaling filter when it does a BF.INFO call. For the non scaling correctness test we check that the new field does not appear
Added a more in depth correctness test as well where we initially try to create two filters where the VALIDATESCALETO is 1 more than the MAXSCALEDCAPACITY (one fails due to fp_rate degradation, one exceeds memory limit). We then actually create it with VALIDATESCALETO set to the MAXSCALEDCAPACITY of the two keys. We then do an info call and get the returned MAXSCALEDCAPACITY and add that many items to both bloom filters. From there we then add 1 more item and expect a failure with either a exceeded memory error or fp rate reaches 0 error. At the very end we then compare a new result of a BF.INFO call to make sure that MAXSCALEDCAPACITY has not changed after adding items and scaling out.
Unit tests:
Added a unit tests for the calculate_max_scaled_capacity method with 5 different variations. For each variation it does a happy check where we will not receive an error from the command. We check that we get the correct returned capacity it reached after it went past the desired 'validate_scale_to'. Next we do an error case for each, four of them we expect to fail due to memory limit and one we expect to fail on fp_rate.
Also expanded test_seed to double check that random seed is different each time.
Apendix
Also suggested in the beginning was a second option: For option 2 we will add a new command that would output the capacity that can be reached before having an error thrown from scaling out given the bloom object memory usage limit (by default 128MB), fp rate, Tightening, expansion.
Option 2: Create a new command BF.GETMAXCAPACITY
This would check what the max capacity of a bloom filter is given a capacity, error rate and expansion. This command will look like:
This command would then output the capacity that could be reached before having an error thrown from scaling out. This would not create a filter given the specified arguments.