vine: find worker by files #4045

JinZhou5042 · 2025-01-28T20:13:51Z

Proposed Changes

As Connor has been complained about for several months, there was a significant slowdown towards the end of the workflow that has more than 100K tasks and terabytes of data.

@dthain and I discussed the inefficiency in the algorithm that iterates through all workers to select the one with the largest inputs, which could be very time consuming when each worker holds numerous files.

Although the manager uses hash_table_lookup to locate files, which is about O(1), it still traverses every worker and calls check_worker_against_task for each one, which could be improved be first sorting workers by the task's input sizes and then calling check_worker_against_task in descending order of likelihood.

As for the the dramatic slowdown towards the workflow's final phases, my impression was that the problem was solved by simply switching to the find_worker_by_random, but as the find_worker_by_files is not actually iterating the file table I need to investigate it further...

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

make test Run local tests prior to pushing.
make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
make lint Run lint on source code prior to pushing.
Manual Update: Update the manual to reflect user-visible changes.
Type Labels: Select a github label for the type: bugfix, enhancement, etc.
Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
PR RTM: Mark your PR as ready to merge.

JinZhou5042 added 4 commits January 28, 2025 11:27

init

cb77134

enqueue on priority update

fb59284

lint

0960d8e

init

9aa8086

JinZhou5042 self-assigned this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vine: find worker by files #4045

vine: find worker by files #4045

JinZhou5042 commented Jan 28, 2025 •

edited

Loading

vine: find worker by files #4045

Are you sure you want to change the base?

vine: find worker by files #4045

Conversation

JinZhou5042 commented Jan 28, 2025 • edited Loading

Proposed Changes

Merge Checklist

JinZhou5042 commented Jan 28, 2025 •

edited

Loading