Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexMap sometimes reports a lower capacity after removing elements #183

Closed
tesselode opened this issue Mar 29, 2021 · 9 comments · Fixed by #266
Closed

IndexMap sometimes reports a lower capacity after removing elements #183

tesselode opened this issue Mar 29, 2021 · 9 comments · Fixed by #266

Comments

@tesselode
Copy link

tesselode commented Mar 29, 2021

I made a small test script that adds and removes elements from an IndexMap:

use indexmap::IndexMap;

fn main() {
	let mut map = IndexMap::with_capacity(10);
	for i in 0..1000 {
		map.insert(i, i);
		println!("{} / {}", map.len(), map.capacity());
		if i % 5 == 0 {
			map.swap_remove_index(i / 2);
			map.swap_remove_index(i / 3);
			map.swap_remove_index(i / 4);
		}
	}
}

Sometimes, when the length of the map (left) decreases, the capacity (right) decreases as well.

410 / 431
411 / 432
412 / 432
413 / 432
411 / 431
412 / 431
413 / 431
414 / 432
415 / 432
414 / 430
415 / 430
416 / 430
417 / 430
418 / 431
416 / 429
417 / 429
418 / 429
419 / 430
420 / 430
419 / 429
420 / 429
421 / 429
422 / 430
423 / 430
421 / 427
422 / 427
423 / 427
424 / 427
425 / 427

From what I understand, the "capacity" is the total number of elements the map can hold in the chunk of memory it has allocated. If you add an item when the map is full, I would expect the capacity to increase as the map allocates more memory. I would never expect the capacity to decrease.

This behavior occurs with both swap_remove_index and shift_remove_index.

@tesselode tesselode changed the title IndexMap sometimes reports a lower capacity after shift_remove-ing elements IndexMap sometimes reports a lower capacity after swap_remove-ing elements Mar 29, 2021
@tesselode tesselode changed the title IndexMap sometimes reports a lower capacity after swap_remove-ing elements IndexMap sometimes reports a lower capacity after removing elements Mar 29, 2021
@cuviper
Copy link
Member

cuviper commented Mar 29, 2021

This is an artifact of hashbrown::raw::RawTable, and you can reproduce it with std::collections::HashMap too, just removing by key instead of index. Capacity is reported as items + growth_left, but the latter is only sometimes updated on removal. That is, a removed entry may be marked EMPTY and increase growth_left, but it sometimes needs a DELETED tombstone and does not change growth_left, which is when you see the reported capacity decrease.

IndexMap has two relevant capacities, the RawTable and the item Vec, and we report the lesser of those.

@tesselode
Copy link
Author

tesselode commented Mar 29, 2021

Why does DELETED not change growth_left? Should I take this to mean that the capacity of the map actually decreases, i.e. if I create a map with a capacity of 10, later on an allocation might be required to add items even if the new length does not exceed 10?

@cuviper
Copy link
Member

cuviper commented Mar 29, 2021

Why does DELETED not change growth_left?

I think it's because those DELETED entries are considered part of the load factor of the map, affecting how far you might have to search for entries from their ideal hash location. Hashbrown doesn't shift other items back when removing.

Should I take this to mean that the capacity of the map actually decreases, i.e. if I create a map with a capacity of 10, later on an allocation might be required to add items even if the new length does not exceed 10?

Yes, that's true. If a new insertion lands on a DELETED entry, it can reclaim that without resizing, but otherwise it has to reconcile the load factor. One further trick is that when there are a lot of DELETED entries, it may just rehash everything in-place instead of actually reallocating. IndexMap already stores the computed hashes, so rehashing the raw table is relatively cheap.

We could potentially add some heuristics to indexmap to scrub the table more aggressively, and/or add some API to let the user trigger that themselves.

@tesselode
Copy link
Author

OK, so I guess that means I shouldn't use IndexMap in a scenario where I don't want to trigger memory allocations after the initial creation of the map?

@cuviper
Copy link
Member

cuviper commented Mar 29, 2021

Currently, the only guarantee about avoiding allocation is if you dynamically check the capacity().

With hashbrown 0.11 (#181) we could add a method that internally uses RawTable::try_insert_no_grow, and on failure we could rehash to scrub DELETED entries and try again. Maybe we could also figure that out now using the reported capacity() and what we know of the expected load factor of the raw buckets().

@tesselode
Copy link
Author

In the meantime, maybe it would be good to add something to the documentation about this? I'm sure I'm not the only one confused about why the capacity can decrease over time.

@bluss
Copy link
Member

bluss commented Apr 5, 2021

Adding docs seems like the sensible action to resolve this issue, for the rest we refer to hashbrown.

@cuviper
Copy link
Member

cuviper commented Apr 5, 2021

Hashbrown's capacity says this:

This number is a lower bound; the HashMap<K, V> might be able to hold more, but is guaranteed to be able to hold at least this many.

We could say something like that, and perhaps also note that removal may cause that to decrease.

@cuviper
Copy link
Member

cuviper commented Apr 5, 2021

@tesselode if you're interested, maybe you could comment on rust-lang/hashbrown#255 why you're concerned about this, and whether you think such a manual vacuum operation would work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants