Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design question: what is comm->planner.tmpCollWorkQueue used for? #1592

Open
YconquestY opened this issue Jan 30, 2025 · 0 comments
Open

Design question: what is comm->planner.tmpCollWorkQueue used for? #1592

YconquestY opened this issue Jan 30, 2025 · 0 comments

Comments

@YconquestY
Copy link

Hello, I have a question on the design philosophy of NCCL.

The call sequence of some "important" functions in groupLaunch is: ncclPrepareTasks $\rightarrow$ ncclPrepareTasks $\rightarrow \cdots \rightarrow$ ncclPrepareTasks $\rightarrow$ ncclTasksRegAndEnqueue $\rightarrow$ ncclTasksRegAndEnqueue $\rightarrow \cdots \rightarrow$ ncclTasksRegAndEnqueue $\rightarrow$ doLaunches.

ncclPrepareTasks is called repeatedly, wherein NVLS-relates tasks are put to a work queue, tmpCollWorkQueue.

Later, ncclTasksRegAndEnqueue is also called in a loop. It dequeues tmpCollWorkQueue and push whatever popped to collWorkQueue.

I do not understand the use of tmpCollWorkQueue. It seems that it serves as software cache for NVLS work. Now that NVLS work will be pushed to collWorkQueue anyway, why don't we just let ncclTasksRegsAndEnqueue do the job?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant