-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update / add task execution hooks #1269
base: main
Are you sure you want to change the base?
Update / add task execution hooks #1269
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 18e6874 in 50 seconds
More details
- Looked at
544
lines of code in8
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. hamilton/execution/state.py:387
- Draft comment:
Consider documenting thepost_task_expand
hook in the docstring of theupdate_task_state
method to maintain clarity and consistency. - Reason this comment was not posted:
Confidence changes required:50%
The PR introduces a new hookpost_task_expand
inExecutionState.update_task_state
. This hook is called after the task expansion process. The hook is correctly implemented and follows the pattern of other hooks. However, the hook is not documented in the docstring of theupdate_task_state
method, which could lead to confusion for future developers. It is important to document all hooks in the method's docstring to maintain clarity and consistency.
2. hamilton/driver.py:245
- Draft comment:
Consider documenting thepost_task_group
hook in the docstring of theexecute
method to maintain clarity and consistency. - Reason this comment was not posted:
Confidence changes required:50%
The PR introduces a new hookpost_task_group
in theTaskBasedGraphExecutor.execute
method. This hook is called after the task grouping process. The hook is correctly implemented and follows the pattern of other hooks. However, the hook is not documented in the docstring of theexecute
method, which could lead to confusion for future developers. It is important to document all hooks in the method's docstring to maintain clarity and consistency.
3. hamilton/execution/executors.py:141
- Draft comment:
Avoid logging sensitive data such as task IDs or node names if they contain sensitive information. - Reason this comment was not posted:
Comment was not on a valid diff hunk.
4. hamilton/execution/state.py:387
- Draft comment:
Consider adding documentation for thepost_task_expand
hook in thedocs/
directory to help users understand its usage and parameters. - Reason this comment was not posted:
Confidence changes required:50%
Thepost_task_expand
hook is added in multiple places, includinghamilton/execution/state.py
. This hook is responsible for handling task expansion, and it is important to ensure that the parameters passed to it are well-documented and clear. This is a good candidate for documentation in thedocs/
directory to help users understand how to use this hook effectively.
Workflow ID: wflow_kKz7HCZoN9F1md9Q
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Ah, I see why the docs check failed - I forgot to add references for |
This looks great! Would love to see the implementation of that rich-based lifecycle adapter with this -- E.G. to show how the parameters are used. Only concern is that I want to make sure we're leaving the implementation open for the future, and if there's not an immediate need for data objects (E.G. |
Sure, no problem, I can add the rich based lifecycle adapter here! Ah, good point, I didn't think about the data objects in that context, I can update the |
Yep! I think that would work well. Sorry for the delay, slipped my mind a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to commit the changes!
"post_task_expand", | ||
run_id=completed_task.run_id, | ||
task_id=completed_task.task_id, | ||
parameters=parameterization_values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there's a slight design issue here, curious as to your thoughts. The fact that we create a dict with all the parameterization values is not part of the contract -- the idea is we could go to having a generator where they're not all decided for now. Not married to this (it's a bit baked in that it's a list now), but curious what you're planning on using this value for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversely, it could change to storing the whole set of values as a list regardless of whether it's a generator or not, only if this hook exists -- that could be an implementation detail we could handle later (e.g. add something that materializes it) -- this would allow us to release this now and not change the contract.
@abc.abstractmethod | ||
def post_task_expand(self, *, run_id: str, task_id: str, parameters: Dict[str, Any]): | ||
"""Hook that is called immediately after a task is expanded into separate task. Note that this is only useful | ||
in dynamic execution, although we reserve the right to add this back into the standard hamilton execution pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will be added back in -- the expansion piece is a specific dynamic execution notion.
|
||
@override | ||
@final | ||
def post_task_group(self, *, run_id: str, tasks: List[TaskSpec]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should expose the task spec here now -- what data in it do we need? can narrow the contract to give us more flexibility...
Attempts to address #1196. Apologies for the long delay, sometimes life really gets in the way 🙄
Currently, for task-based DAGs,
TaskExecutionHook
will only fire before and after a task is executed. The hooks have no knowledge of the overall task landscape, including:Changes
This PR proposes the following changes to the task-based lifecycle adapters:
spawning_task_id
and nodepurpose
to the public methods and internal hooks associated withTaskExecutionHook
post_task_group
that runs after the tasks are groupedpost_task_expand
that runs after the expander task is parameterizedTaskGroupingHook
that exposes public methods forpost_task_group
andpost_task_expand
How I tested this
Checks for the existence and correctness of the
spawning_task_id
andpurpose
parameters were added to relevant (existing) task-based lifecycle tests. Additionally, new test cases were created for bothpost_task_group
andpost_task_expand
.Notes
In #1196 we mentioned adding a rich-based lifecycle adapter (similar to one used to demonstrate the above code). I thought splitting that particular adapter PR from the underlying events might be cleaner – feel free to wave me off, I am still happy to contribute those progress bars!
Checklist