Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RawTable::vacuum to clean up DELETED entries #255

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Apr 5, 2021

This cleans the table to ensure the maximum usable capacity in its
current allocation, rehashing items that need to move around
previously-deleted entries.

This cleans the table to ensure the maximum usable capacity in its
current allocation, rehashing items that need to move around
previously-deleted entries.
@cuviper
Copy link
Member Author

cuviper commented Apr 5, 2021

This was inspired by indexmap-rs/indexmap#183, and the name by a database VACUUM, like PostgreSQL or SQLite. If there's interest, I could also add methods to HashMap and HashSet.

@Amanieu
Copy link
Member

Amanieu commented Apr 5, 2021

I have doubts about how useful this will be in practice. In almost all cases reserve_rehash will do the right thing and call rehash_in_place instead of growing the table. The heuristic in reserve_rehash will grow the table if it is more than half full, which avoids situations where you end up calling rehash_in_place on every insert if the table is at capacity and you are constantly removing & adding one item.

@cuviper
Copy link
Member Author

cuviper commented Apr 5, 2021

I'm trying to provide an option for the concern in indexmap-rs/indexmap#183 (comment):

OK, so I guess that means I shouldn't use IndexMap in a scenario where I don't want to trigger memory allocations after the initial creation of the map?

In that scenario, I suppose they might be working more than half full, where reserve_rehash would want to grow. I'm not proposing to automatically vacuum, because the growth heuristic is probably better in general, but it may still be useful in memory-constrained environments.

@Amanieu
Copy link
Member

Amanieu commented Apr 5, 2021

You could just reserve 2x the needed capacity, which ensures that reserve_rehash will always use the rehash_in_place path.

Or just don't do anything: in the worst case you get a single reallocation, after which the rehash_in_place path is always used.

@tesselode
Copy link

To provide a bit more context, the memory constrained environment I'm working in is an audio library. Generally people working with audio don't want to allocate memory on the audio thread, because if the operating system takes a long time to allocate memory, it could result in audio stuttering, which is very unpleasant. So once I create a hash map, I never want to allocate memory for it again.

It may just be that I need a more specialized data structure.

@Amanieu
Copy link
Member

Amanieu commented Apr 6, 2021

As I've said before, hashbrown will automatically clean up deleted entries when it runs out of capacity so it won't call out to the allocator to grow the table unless you actually need it.

@bors
Copy link
Contributor

bors commented Aug 24, 2023

☔ The latest upstream changes (presumably #458) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants