Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed Changes
As Connor has been complained about for several months, there was a significant slowdown towards the end of the workflow that has more than 100K tasks and terabytes of data.
@dthain and I discussed the inefficiency in the algorithm that iterates through all workers to select the one with the largest inputs, which could be very time consuming when each worker holds numerous files.
Although the manager uses
hash_table_lookup
to locate files, which is aboutO(1)
, it still traverses every worker and callscheck_worker_against_task
for each one, which could be improved be first sorting workers by the task's input sizes and then callingcheck_worker_against_task
in descending order of likelihood.As for the the dramatic slowdown towards the workflow's final phases, my impression was that the problem was solved by simply switching to the
find_worker_by_random
, but as thefind_worker_by_files
is not actually iterating the file table I need to investigate it further...Merge Checklist
The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.
make test
Run local tests prior to pushing.make format
Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lint
Run lint on source code prior to pushing.