[ENHANCEMENT] Max Priority Queue in graph scheduling. #226

Mattcrmx · 2024-11-19T08:34:21Z

While skimming over the codebase, I saw that we kept on re selecting the node with the highest priority while we could just maintain a max priority queue at the beginning and pop from it : it would both be cleaner and simpler, while being more optimized (O(1) pop), I can write a PR today, wdyt @bashirmindee ?

tawazi/tawazi/_dag/helpers.py

Line 318 in c0f0cc2

    
           highest_priority_id = max(runnable_xns_ids, key=lambda id_: graph.compound_priority[id_])

bashirmindee · 2024-11-19T08:37:55Z

what do you mean by that ? we are obliged to construct a new queue every time we complete the execution of a root node, because new runnable_xns_ids might be created.

Mattcrmx · 2024-11-19T09:01:14Z

My idea was not to maintain only one queue, but rather have a pre computed queue in the graph, and compute new queues with updated nodes by only traversing the queue (so some kind of filtered queue based on the availability), that way, whenever a new node gets available, we extend the current max queue with the new children (in python heapq implements a min/max queue natively) and we still can pop from it, it avoids re computing the max every time since we've already done it at the beginning, and we just pop from the newly constructed queue the max node we want to run. I think this would be most beneficial if we have a super flat DAG at some point (such as one when there is a lot of nodes that can be run at the same time). This actually emerged from the fact that we search for the max node in all runnable nodes even though we're only capable of running MAX_CONCURRENCY at the same time, so in itself it can already be a bottleneck

bashirmindee · 2024-11-19T12:40:16Z

I think this won't be beneficial and pretty complicated TBH. Remember that the newly created root nodes are dynamic after each iteration of the scheduler; hence the number of combinaisons for the queues that need to be constructed is huge and will give an OOM directly in some cases

Mattcrmx · 2024-11-20T07:28:49Z

Nope, my idea was to compute a queue at first and extract the sub queue each time without recomputing, I'll do a simple poc and update you on i, maybe it'll be clearer

Mattcrmx added the enhancement New feature or request label Nov 19, 2024

Mattcrmx assigned Mattcrmx and bashirmindee Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Max Priority Queue in graph scheduling. #226

[ENHANCEMENT] Max Priority Queue in graph scheduling. #226

Mattcrmx commented Nov 19, 2024

bashirmindee commented Nov 19, 2024

Mattcrmx commented Nov 19, 2024

bashirmindee commented Nov 19, 2024

Mattcrmx commented Nov 20, 2024

[ENHANCEMENT] Max Priority Queue in graph scheduling. #226

[ENHANCEMENT] Max Priority Queue in graph scheduling. #226

Comments

Mattcrmx commented Nov 19, 2024

bashirmindee commented Nov 19, 2024

Mattcrmx commented Nov 19, 2024

bashirmindee commented Nov 19, 2024

Mattcrmx commented Nov 20, 2024