You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While skimming over the codebase, I saw that we kept on re selecting the node with the highest priority while we could just maintain a max priority queue at the beginning and pop from it : it would both be cleaner and simpler, while being more optimized (O(1) pop), I can write a PR today, wdyt @bashirmindee ?
what do you mean by that ? we are obliged to construct a new queue every time we complete the execution of a root node, because new runnable_xns_ids might be created.
My idea was not to maintain only one queue, but rather have a pre computed queue in the graph, and compute new queues with updated nodes by only traversing the queue (so some kind of filtered queue based on the availability), that way, whenever a new node gets available, we extend the current max queue with the new children (in python heapq implements a min/max queue natively) and we still can pop from it, it avoids re computing the max every time since we've already done it at the beginning, and we just pop from the newly constructed queue the max node we want to run. I think this would be most beneficial if we have a super flat DAG at some point (such as one when there is a lot of nodes that can be run at the same time). This actually emerged from the fact that we search for the max node in all runnable nodes even though we're only capable of running MAX_CONCURRENCY at the same time, so in itself it can already be a bottleneck
I think this won't be beneficial and pretty complicated TBH. Remember that the newly created root nodes are dynamic after each iteration of the scheduler; hence the number of combinaisons for the queues that need to be constructed is huge and will give an OOM directly in some cases
Nope, my idea was to compute a queue at first and extract the sub queue each time without recomputing, I'll do a simple poc and update you on i, maybe it'll be clearer
While skimming over the codebase, I saw that we kept on re selecting the node with the highest priority while we could just maintain a max priority queue at the beginning and pop from it : it would both be cleaner and simpler, while being more optimized (O(1) pop), I can write a PR today, wdyt @bashirmindee ?
tawazi/tawazi/_dag/helpers.py
Line 318 in c0f0cc2
The text was updated successfully, but these errors were encountered: