Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update / add task execution hooks #1269

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

cswartzvi
Copy link
Contributor

Attempts to address #1196. Apologies for the long delay, sometimes life really gets in the way 🙄

Currently, for task-based DAGs, TaskExecutionHook will only fire before and after a task is executed. The hooks have no knowledge of the overall task landscape, including:

  • Overall groups in the graph
  • Details about the expander task parameterization
  • Type of current task (expander, collector, etc.)
  • Spawning task ID (if available)

Changes

This PR proposes the following changes to the task-based lifecycle adapters:

  • Add parameters for spawning_task_id and node purpose to the public methods and internal hooks associated with TaskExecutionHook
  • Add a new internal hook post_task_group that runs after the tasks are grouped
  • Adda new internal hook post_task_expand that runs after the expander task is parameterized
  • Create a new lifecycle adapter TaskGroupingHook that exposes public methods for post_task_group and post_task_expand

How I tested this

Checks for the existence and correctness of the spawning_task_id and purpose parameters were added to relevant (existing) task-based lifecycle tests. Additionally, new test cases were created for both post_task_group and post_task_expand.

Notes

In #1196 we mentioned adding a rich-based lifecycle adapter (similar to one used to demonstrate the above code). I thought splitting that particular adapter PR from the underlying events might be cleaner – feel free to wave me off, I am still happy to contribute those progress bars!

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 18e6874 in 50 seconds

More details
  • Looked at 544 lines of code in 8 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. hamilton/execution/state.py:387
  • Draft comment:
    Consider documenting the post_task_expand hook in the docstring of the update_task_state method to maintain clarity and consistency.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR introduces a new hook post_task_expand in ExecutionState.update_task_state. This hook is called after the task expansion process. The hook is correctly implemented and follows the pattern of other hooks. However, the hook is not documented in the docstring of the update_task_state method, which could lead to confusion for future developers. It is important to document all hooks in the method's docstring to maintain clarity and consistency.
2. hamilton/driver.py:245
  • Draft comment:
    Consider documenting the post_task_group hook in the docstring of the execute method to maintain clarity and consistency.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR introduces a new hook post_task_group in the TaskBasedGraphExecutor.execute method. This hook is called after the task grouping process. The hook is correctly implemented and follows the pattern of other hooks. However, the hook is not documented in the docstring of the execute method, which could lead to confusion for future developers. It is important to document all hooks in the method's docstring to maintain clarity and consistency.
3. hamilton/execution/executors.py:141
  • Draft comment:
    Avoid logging sensitive data such as task IDs or node names if they contain sensitive information.
  • Reason this comment was not posted:
    Comment was not on a valid diff hunk.
4. hamilton/execution/state.py:387
  • Draft comment:
    Consider adding documentation for the post_task_expand hook in the docs/ directory to help users understand its usage and parameters.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The post_task_expand hook is added in multiple places, including hamilton/execution/state.py. This hook is responsible for handling task expansion, and it is important to ensure that the parameters passed to it are well-documented and clear. This is a good candidate for documentation in the docs/ directory to help users understand how to use this hook effectively.

Workflow ID: wflow_kKz7HCZoN9F1md9Q


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@cswartzvi cswartzvi changed the title Update task execution hook Update / add task execution hooks Jan 7, 2025
@cswartzvi
Copy link
Contributor Author

Ah, I see why the docs check failed - I forgot to add references for TaskExecutionHook and TaskGroupingHook 🤦🏻. I will update this shortly.

@elijahbenizzy
Copy link
Collaborator

Attempts to address #1196. Apologies for the long delay, sometimes life really gets in the way 🙄

Currently, for task-based DAGs, TaskExecutionHook will only fire before and after a task is executed. The hooks have no knowledge of the overall task landscape, including:

  • Overall groups in the graph
  • Details about the expander task parameterization
  • Type of current task (expander, collector, etc.)
  • Spawning task ID (if available)

Changes

This PR proposes the following changes to the task-based lifecycle adapters:

  • Add parameters for spawning_task_id and node purpose to the public methods and internal hooks associated with TaskExecutionHook
  • Add a new internal hook post_task_group that runs after the tasks are grouped
  • Adda new internal hook post_task_expand that runs after the expander task is parameterized
  • Create a new lifecycle adapter TaskGroupingHook that exposes public methods for post_task_group and post_task_expand

How I tested this

Checks for the existence and correctness of the spawning_task_id and purpose parameters were added to relevant (existing) task-based lifecycle tests. Additionally, new test cases were created for both post_task_group and post_task_expand.

Notes

In #1196 we mentioned adding a rich-based lifecycle adapter (similar to one used to demonstrate the above code). I thought splitting that particular adapter PR from the underlying events might be cleaner – feel free to wave me off, I am still happy to contribute those progress bars!

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

This looks great! Would love to see the implementation of that rich-based lifecycle adapter with this -- E.G. to show how the parameters are used. Only concern is that I want to make sure we're leaving the implementation open for the future, and if there's not an immediate need for data objects (E.G. TaskImplementation), then it's cleaner, IMO, to expose less (just the features we need).

@cswartzvi
Copy link
Contributor Author

Sure, no problem, I can add the rich based lifecycle adapter here!

Ah, good point, I didn't think about the data objects in that context, I can update the tasks: List[TaskSpec] parameter in post_task_group to be something like task_ids: List[str]. How does that work?

@zilto zilto mentioned this pull request Jan 18, 2025
7 tasks
@elijahbenizzy
Copy link
Collaborator

Sure, no problem, I can add the rich based lifecycle adapter here!

Ah, good point, I didn't think about the data objects in that context, I can update the tasks: List[TaskSpec] parameter in post_task_group to be something like task_ids: List[str]. How does that work?

Yep! I think that would work well. Sorry for the delay, slipped my mind a bit.

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to commit the changes!

"post_task_expand",
run_id=completed_task.run_id,
task_id=completed_task.task_id,
parameters=parameterization_values,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there's a slight design issue here, curious as to your thoughts. The fact that we create a dict with all the parameterization values is not part of the contract -- the idea is we could go to having a generator where they're not all decided for now. Not married to this (it's a bit baked in that it's a list now), but curious what you're planning on using this value for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversely, it could change to storing the whole set of values as a list regardless of whether it's a generator or not, only if this hook exists -- that could be an implementation detail we could handle later (e.g. add something that materializes it) -- this would allow us to release this now and not change the contract.

@abc.abstractmethod
def post_task_expand(self, *, run_id: str, task_id: str, parameters: Dict[str, Any]):
"""Hook that is called immediately after a task is expanded into separate task. Note that this is only useful
in dynamic execution, although we reserve the right to add this back into the standard hamilton execution pattern.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will be added back in -- the expansion piece is a specific dynamic execution notion.


@override
@final
def post_task_group(self, *, run_id: str, tasks: List[TaskSpec]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should expose the task spec here now -- what data in it do we need? can narrow the contract to give us more flexibility...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants