Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Max Priority Queue in graph scheduling. #226

Open
Mattcrmx opened this issue Nov 19, 2024 · 4 comments
Open

[ENHANCEMENT] Max Priority Queue in graph scheduling. #226

Mattcrmx opened this issue Nov 19, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@Mattcrmx
Copy link
Collaborator

While skimming over the codebase, I saw that we kept on re selecting the node with the highest priority while we could just maintain a max priority queue at the beginning and pop from it : it would both be cleaner and simpler, while being more optimized (O(1) pop), I can write a PR today, wdyt @bashirmindee ?

highest_priority_id = max(runnable_xns_ids, key=lambda id_: graph.compound_priority[id_])

@Mattcrmx Mattcrmx added the enhancement New feature or request label Nov 19, 2024
@bashirmindee
Copy link
Collaborator

what do you mean by that ? we are obliged to construct a new queue every time we complete the execution of a root node, because new runnable_xns_ids might be created.

@Mattcrmx
Copy link
Collaborator Author

My idea was not to maintain only one queue, but rather have a pre computed queue in the graph, and compute new queues with updated nodes by only traversing the queue (so some kind of filtered queue based on the availability), that way, whenever a new node gets available, we extend the current max queue with the new children (in python heapq implements a min/max queue natively) and we still can pop from it, it avoids re computing the max every time since we've already done it at the beginning, and we just pop from the newly constructed queue the max node we want to run. I think this would be most beneficial if we have a super flat DAG at some point (such as one when there is a lot of nodes that can be run at the same time). This actually emerged from the fact that we search for the max node in all runnable nodes even though we're only capable of running MAX_CONCURRENCY at the same time, so in itself it can already be a bottleneck

@bashirmindee
Copy link
Collaborator

I think this won't be beneficial and pretty complicated TBH. Remember that the newly created root nodes are dynamic after each iteration of the scheduler; hence the number of combinaisons for the queues that need to be constructed is huge and will give an OOM directly in some cases

@Mattcrmx
Copy link
Collaborator Author

Nope, my idea was to compute a queue at first and extract the sub queue each time without recomputing, I'll do a simple poc and update you on i, maybe it'll be clearer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants