Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegation fixes #6165

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Delegation fixes #6165

wants to merge 19 commits into from

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented Jan 9, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
    Fix agent delegation; use events for communication between parent and delegates.
    Fix the lockup when the model returned a message.

Give a summary of what the PR does, explaining any non-trivial design decisions

Delegation was broken after we made the agent loop rely exclusively on a controller-as-observer logic. This PR proposes to fix it in a simple way: by forwarding to the delegate

  • refactor the current logic (of unsubscribing parent when delegate starts and vice versa): now ONLY the parent is subscribed and stays subscribed, and it forwards to the delegate when it has one
  • should_step on both MessageActions from 'user' and 'agent', except when waiting for user input is explicitly set
  • should_step on DelegateAction too, it will create a MessageAction to kickstart the delegate
  • refactor ending conditions
  • added integration tests for DelegatorAgent

Also:

  • the delegate starts with a MessageAction
  • and ends with a FinishAction

The code is ready for review - or this logic of delegation.
(please ignore the print() stuff, will clean up later)


Link of any specific issues this addresses
Fix #6162


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0f00ea6-nikolaik   --name openhands-app-0f00ea6   docker.all-hands.dev/all-hands-ai/openhands:0f00ea6

@enyst enyst marked this pull request as draft January 9, 2025 07:30

This comment was marked as outdated.

@enyst
Copy link
Collaborator Author

enyst commented Jan 9, 2025

@openhands-agent Read the diff of this PR carefully. Understand what it tries to achieve. Then, we have two things to do:

  1. The diff has added debug prints. We need them! And we need to enhance them:
  • all events have an event.id, add it to the print() statements after the class name, like '({event.id})'
  1. The unit tests in test_agent_controller.py are outdated and failing. Update them to the new behavior. Understand the difference in the context of the changes of this PR and fix them.

Important:
You don't need to test the rest. Just this test file.

@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 9, 2025
@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 9, 2025
@All-Hands-AI All-Hands-AI deleted a comment from github-actions bot Jan 9, 2025
Copy link
Contributor

github-actions bot commented Jan 9, 2025

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@enyst enyst marked this pull request as ready for review January 9, 2025 16:47
@enyst enyst requested a review from rbren January 9, 2025 17:01
@enyst enyst requested a review from li-boxuan January 9, 2025 17:10

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link
Contributor

Trigger by: Pull Request (integration-test label on PR #6165)
Commit: 01c459a
Integration Tests Report (Haiku)
Haiku LLM Test Results:

uncomment me


Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:

uncomment me


Integration Tests Report Delegator (Haiku)
Success rate: 50.00% (1/2)

Total cost: USD 0.00

instance_id success reason cost error_message
t02_add_bash_hello True 0 nan
t01_fix_simple_typo False File not fixed: This is a silly text. 0 RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 30, max iteration: 30
Really?
No more text!
Enjoy!

Integration Tests Report Delegator (DeepSeek)
Success rate: 100.00% (2/2)

Total cost: USD 0.00

instance_id success reason cost error_message
t01_fix_simple_typo True 0
t02_add_bash_hello True 0

Download testing outputs (includes both Haiku and DeepSeek results): Download

Copy link
Collaborator

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this fix!

@@ -165,7 +169,11 @@ async def close(self) -> None:
)

# unsubscribe from the event stream
self.event_stream.unsubscribe(EventStreamSubscriber.AGENT_CONTROLLER, self.id)
# only the root parent controller subscribes to the event stream
if not self.is_delegate:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: does this design work for multi-layer delegation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I thought about keeping that, but I don't know how useful it really is in the practice around us, and I'd leave it for a follow-up. I think it did work before, but I don't think we ever tested for it? Do you find it useful?

A use case that keeps appearing around is planner / executor type of workflows (for example, and it's only one example 3770) - and those don't need it, but I would love it if we support them well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a more generalized solution that can handle a tree of multi-level delegation.
We don't use multi-level right now, but it would be annoying if we had to hack up the framework to make it work later and diverge the agent_controller implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it did work before, but I don't think we ever tested for it?

Yeah it did work before, and I think we had a test for it; the dummy agent used to delegate to itself then itself and then itself again...

Do you find it useful?

Not really 😁 in my imaginary setting, it wasn't useful because LLMs were not powerful enough to be used for super long-horizon tasks (e.g. found a company and release a product). Could it be? Maybe, maybe not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just babbling: maybe multi-layer delegation setting would be more useful in robotic/industrial engineering areas? Where "agents" really don't care about the past and future, and there's very narrow but different action spaces that they shall follow.

Or maybe this: a very intelligent yet expensive agent that makes decisions, which hands over to good and not-too-expensive agents, which sometimes needs some work to be done by mediocre and cheap agents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CodeAct ever decides to delegate to BrowsingAgent again or delegate to a micro agent, then we will need multi layer for anything that uses planner -> CodeAct -> Browsing/Micro

It doesn't need to be that crazy deep but needing to support 2 to 3 layers doesn't seem that unlikely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are good points! I think it will work fine. This PR changes the behavior at the edges, so that the delegated agent is exactly like a non-delegated agent, no special handling necessary to become a delegate or to delegate, they use the event stream.

Like if you want to use CodeAct, you send a delegate action asking for codeact, and you don't have to modify the agent for it to work. If an agent knows about the delegate tool, it can use it and become a parent, if not, it's just a kid.😅

I think that makes it easier to add depth, not harder. I kept it simple to see and test the flow, but it should be just making use of the level counter. (famous last words? 🤣) Will look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Regression in AgentController broke AgentDelegationAction
4 participants