Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Method to count entries starting with a prefix #38

Open
ccollie opened this issue Sep 17, 2024 · 5 comments
Open

[Feature Request] Method to count entries starting with a prefix #38

ccollie opened this issue Sep 17, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@ccollie
Copy link

ccollie commented Sep 17, 2024

I'd like to be able to get a count of entries starting with a given prefix. Currently its possible iterating through a prefix iterator and counting elements, however we can optimize this by visiting nodes and summing children.

@Gab-Menezes
Copy link
Collaborator

I might be wrong but just summing the number o children from each node would give the wrong result, imagine a case where we have N4 (Leaf, N4 (Leaf, Leaf)), summing the number o children of each node would result in 4, but it's clear that the number of children is 3.
Adding a new field to header would increase memory consumption, since the header already uses 22 bytes with no padding between fields. So adding a new u32 would mean a 18.18% increase in memory consumption of the header, and also would make insertions/deletions harder, since this value is recursively accumulated up to the root.
Also given how the new prefix iterator works, we basically don't iterate over the tree it self, we just find the min and max leafs with the prefix and do a range iterator over it, which is basically a linked list iteration.

Also iterating over the tree it self would probably be slower than over the leafs with the next pointer, since we need to keep a vec as a stack.

@declanvk
Copy link
Owner

Whoops, I meant to post a comment here a day ago and lost it.


I agree with @Gab-Menezes that the most likely implementation of this would be to add a new field to the header of internal nodes to track the number of leaf node descendants. This would imply some changes to the insert and delete operations, so that the counter would be incremented or decremented on a successful operation.

On the pros side, then we would have the ability to cheaply implement ExactSizeIterator for a larger number of iterators, like for fuzzy_*, prefix_*, or range_*. I think this would be beneficial for cases where:

  1. We want to know the number of entries in a given iterator (this feature request)
  2. May improve performance when collecting these iterators into a new collection (for collections that support pre-allocating with specific amount of capacity, like Vec::with_capacity)

@declanvk declanvk added the enhancement New feature or request label Sep 20, 2024
@ccollie
Copy link
Author

ccollie commented Sep 20, 2024

The other performance related reason is that it allows me to decide on the most efficient way to iterate the collection. In my case, I run matching functions (including regexes) against items matching a prefix.

Above a certain count, it's worth it to switch to optimized predicates, but those require a compilation step which can then be amortized over the length of longer iterators.

@Gab-Menezes
Copy link
Collaborator

IDK how I feel about this, this will definitely make insertions/deletions way slower, since right now we just find the writing point and apply the changes. With this we somehow need to find a way to keep track of all the parents while inserting/deleting (basically creating a stack, or doing recursion and both have performance implications) to increment this number if the insertion was successful.

@ccollie
Copy link
Author

ccollie commented Sep 20, 2024

If this makes general case insertions/deletions measurably slower, I'd be Ok without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants