[Enhancement] Some idling after executing the last layer in expert worker #45

shawlleyw · 2024-12-26T03:08:47Z

Need to figure out and fix it. One possible reason is copying tensor from GPU to CPU and sending it to sampler via zmq.

hogura99 · 2024-12-26T03:53:13Z

At what cases?

step_attn=2, DP=1, step_exp=1, EP=2?

shawlleyw · 2024-12-26T04:58:12Z

Almost every case as I remember

hogura99 · 2024-12-27T04:33:09Z

Will use context recorder in #50 to check the details.

hogura99 · 2024-12-30T06:48:50Z

ZMQ send took a long time that also influences other kernels like NCCLRecv and gather_tokens_cuda. It should be caused by device2host memcpy.

hogura99 · 2025-01-02T10:30:00Z

The overlapping of tensor copy and expert execution has been done. But the performance has not been improved a lot since the problem also exists in attn, as following.

Will be fixed later.

hogura99 changed the title ~~Some idling after executing the last layer in expert worker~~ [Enhancement] Some idling after executing the last layer in expert worker Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Some idling after executing the last layer in expert worker #45

[Enhancement] Some idling after executing the last layer in expert worker #45

shawlleyw commented Dec 26, 2024

hogura99 commented Dec 26, 2024

shawlleyw commented Dec 26, 2024

hogura99 commented Dec 27, 2024

hogura99 commented Dec 30, 2024

hogura99 commented Jan 2, 2025

[Enhancement] Some idling after executing the last layer in expert worker #45

[Enhancement] Some idling after executing the last layer in expert worker #45

Comments

shawlleyw commented Dec 26, 2024

hogura99 commented Dec 26, 2024

shawlleyw commented Dec 26, 2024

hogura99 commented Dec 27, 2024

hogura99 commented Dec 30, 2024

hogura99 commented Jan 2, 2025