Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure that all workers are notified about end of execution loop #730

Merged
merged 2 commits into from
Jan 24, 2025

Conversation

kdamaszk
Copy link

@kdamaszk kdamaszk commented Jan 23, 2025

Currently we will have a hang at the end of script when using TP>1 and multistep scheduling. This is caused by lack of notification from driver worker about ending the execution loop.
This is a workaround for this issue, by making sure that all workers are notified at the end of llm_engine loop.
Other possible workaround could be modification of this check: https://github.com/HabanaAI/vllm-fork/blob/habana_main/vllm/engine/llm_engine.py#L1379 with or not self.has_unfinished_requests().

@kdamaszk
Copy link
Author

@michalkuligowski @madamczykhabana please review

@michalkuligowski michalkuligowski merged commit 40745f0 into habana_main Jan 24, 2025
32 checks passed
@michalkuligowski michalkuligowski deleted the dev/kdamaszke/close-all-mp-workers branch January 24, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants