Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vine: count the available resources for a task on a worker #4037

Conversation

JinZhou5042
Copy link
Member

Proposed Changes

Calculate the usable resources for a task on a worker and check it earlier. The avoids resource allocations if the worker doesn't have usable resources for a task at all.

This speeds up my workflows slightly, for about 1.5%

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update: Update the manual to reflect user-visible changes.
  • Type Labels: Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM: Mark your PR as ready to merge.

@JinZhou5042 JinZhou5042 self-assigned this Jan 25, 2025
@JinZhou5042 JinZhou5042 requested a review from btovar January 25, 2025 07:33
Copy link
Member

@btovar btovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is fragile to memory leaks. There are some returns that do not free the struct rmsummary created. Unless there is a strong reason, I'd advise against it.

@JinZhou5042
Copy link
Member Author

The reason I do this is that even though most allocations work fine in #4035 and tasks run pretty densely, the manager still wastes a lot of time checking tasks when there aren’t any usable cores.

The most common issue in check_worker_against_task is that the worker doesn’t have a free core. But it still goes through the whole resource allocation process, and this keeps happening for a bunch of tasks each time, based on the depth set by q->attempt_schedule_depth.

Task scheduling seems to be an expensive part of the main loop, we could save a lot of time by making this more efficient, and then the manager has more time to do other stuff like output retrieving and temp file replication.

@btovar
Copy link
Member

btovar commented Jan 27, 2025

But this change is not reducing the number of computations? It is just moving code around and creates mem leaks?

What about adding a flag to the worker like has_free_resources. On reaping a task, has_free_resources is set to 1. On scheduling a task to the worker is set to 0 if any resource in use matches the total * overcommitment (unless it starts at 0, like gpus).

@JinZhou5042
Copy link
Member Author

This is a smart way!

@JinZhou5042
Copy link
Member Author

An alternative in #4039

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants