You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have a question on the design philosophy of NCCL.
The call sequence of some "important" functions in groupLaunch is: ncclPrepareTasks$\rightarrow$ncclPrepareTasks$\rightarrow \cdots \rightarrow$ncclPrepareTasks$\rightarrow$ncclTasksRegAndEnqueue$\rightarrow$ncclTasksRegAndEnqueue$\rightarrow \cdots \rightarrow$ncclTasksRegAndEnqueue$\rightarrow$doLaunches.
ncclPrepareTasks is called repeatedly, wherein NVLS-relates tasks are put to a work queue, tmpCollWorkQueue.
Later, ncclTasksRegAndEnqueue is also called in a loop. It dequeues tmpCollWorkQueue and push whatever popped to collWorkQueue.
I do not understand the use of tmpCollWorkQueue. It seems that it serves as software cache for NVLS work. Now that NVLS work will be pushed to collWorkQueue anyway, why don't we just let ncclTasksRegsAndEnqueue do the job?
The text was updated successfully, but these errors were encountered:
Hello, I have a question on the design philosophy of NCCL.
The call sequence of some "important" functions in$\rightarrow$ $\rightarrow \cdots \rightarrow$ $\rightarrow$ $\rightarrow$ $\rightarrow \cdots \rightarrow$ $\rightarrow$
groupLaunch
is:ncclPrepareTasks
ncclPrepareTasks
ncclPrepareTasks
ncclTasksRegAndEnqueue
ncclTasksRegAndEnqueue
ncclTasksRegAndEnqueue
doLaunches
.ncclPrepareTasks
is called repeatedly, wherein NVLS-relates tasks are put to a work queue,tmpCollWorkQueue
.Later,
ncclTasksRegAndEnqueue
is also called in a loop. It dequeuestmpCollWorkQueue
and push whatever popped tocollWorkQueue
.I do not understand the use of
tmpCollWorkQueue
. It seems that it serves as software cache for NVLS work. Now that NVLS work will be pushed tocollWorkQueue
anyway, why don't we just letncclTasksRegsAndEnqueue
do the job?The text was updated successfully, but these errors were encountered: