You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement two methods : clear_under() and clear_over() that accept a priority as an argument.
This will essentially truncate the size of the data structure, marking all items over/under this priority as trash/nonexistent.
This would be an O(log(n)) complex operation, as It would take a maximum of log(n) time to sort the structure, and thus, find the upper bound and lower bound of this priority. It would be O(1) to mark the rest of the dataset as trash. (e.g. an iter would return a slice of the min element to this new priority)
This would be a great convenience as it would give this data structure the ability to greatly speed up pruning operations. Currently pruning is O(n) at best, as you can only remove one element at a time. (It is possible to prune the entire set in O(1) using clear but that is besides my point)
The text was updated successfully, but these errors were encountered:
Even though I am not sure the complexity you found in correct, I agree that probably the operation would be faster if done internally.
I am not sure of the complexity of O(log(N)) because even if finding the cutting elements is O(log(N)), the elements must be removed from the datastructure's internal map and correctly dropped, that is O(N) worst case. In the case of clear_over() then, there is an additional operation involved to restore the heap property, that is at least O(log(N)), but I am not sure yet.
This would be very helpful for my use case. I'm looking to keep a list of the top N files on disk by file size, but I want to keep memory consumption reasonable so I only want to keep N items in the pq at any one time.
I did not implement this API because I was not able to find an efficient way to prune the heap. If you know a good algorithm to do that, I am open to suggestions.
For your purpose though, you could use the DoublePriorityQueue, check its len() after the insertion, and if greater then N call a pop_min().
Implement two methods : clear_under() and clear_over() that accept a priority as an argument.
This will essentially truncate the size of the data structure, marking all items over/under this priority as trash/nonexistent.
This would be an O(log(n)) complex operation, as It would take a maximum of log(n) time to sort the structure, and thus, find the upper bound and lower bound of this priority. It would be O(1) to mark the rest of the dataset as trash. (e.g. an iter would return a slice of the min element to this new priority)
This would be a great convenience as it would give this data structure the ability to greatly speed up pruning operations. Currently pruning is O(n) at best, as you can only remove one element at a time. (It is possible to prune the entire set in O(1) using clear but that is besides my point)
The text was updated successfully, but these errors were encountered: