clear_over(priority) and clear_under(priority) #35

simbleau · 2021-11-24T23:58:09Z

Implement two methods : clear_under() and clear_over() that accept a priority as an argument.

This will essentially truncate the size of the data structure, marking all items over/under this priority as trash/nonexistent.

This would be an O(log(n)) complex operation, as It would take a maximum of log(n) time to sort the structure, and thus, find the upper bound and lower bound of this priority. It would be O(1) to mark the rest of the dataset as trash. (e.g. an iter would return a slice of the min element to this new priority)

This would be a great convenience as it would give this data structure the ability to greatly speed up pruning operations. Currently pruning is O(n) at best, as you can only remove one element at a time. (It is possible to prune the entire set in O(1) using clear but that is besides my point)

garro95 · 2021-11-25T07:14:01Z

It is an interesting proposal.

Even though I am not sure the complexity you found in correct, I agree that probably the operation would be faster if done internally.

I am not sure of the complexity of O(log(N)) because even if finding the cutting elements is O(log(N)), the elements must be removed from the datastructure's internal map and correctly dropped, that is O(N) worst case. In the case of clear_over() then, there is an additional operation involved to restore the heap property, that is at least O(log(N)), but I am not sure yet.

mlynch · 2022-06-06T16:10:37Z

This would be very helpful for my use case. I'm looking to keep a list of the top N files on disk by file size, but I want to keep memory consumption reasonable so I only want to keep N items in the pq at any one time.

garro95 · 2022-06-06T17:44:19Z

I did not implement this API because I was not able to find an efficient way to prune the heap. If you know a good algorithm to do that, I am open to suggestions.
For your purpose though, you could use the DoublePriorityQueue, check its len() after the insertion, and if greater then N call a pop_min().

garro95 added the enhancement label Nov 25, 2021

simbleau mentioned this issue Nov 25, 2021

iter_over(priority) and iter_under(priority) #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clear_over(priority) and clear_under(priority) #35

clear_over(priority) and clear_under(priority) #35

simbleau commented Nov 24, 2021

garro95 commented Nov 25, 2021

mlynch commented Jun 6, 2022

garro95 commented Jun 6, 2022

clear_over(priority) and clear_under(priority) #35

clear_over(priority) and clear_under(priority) #35

Comments

simbleau commented Nov 24, 2021

garro95 commented Nov 25, 2021

mlynch commented Jun 6, 2022

garro95 commented Jun 6, 2022