Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SWE-Bench inference - Failed to establish a new connection: [Errno 111] Connection refused #4260

Open
2 tasks done
jatinganhotra opened this issue Oct 8, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@jatinganhotra
Copy link

Is there an existing issue for the same bug?

Describe the bug

Hi team,

When I am trying to run inference for SWE-Bench Lite with > 1 worker, I am getting the following error. The inference runs OK with only 1 worker, which is the default value.

./evaluation/swe_bench/scripts/run_infer.sh MODEL_CONFIG with the default CodeActAgent

I'm getting the following error

Instance django__django-10914 - 2024-10-07 15:21:14,902 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=34090): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffde2390>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance astropy__astropy-12907 - 2024-10-07 15:21:19,293 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=30607): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffb425d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance astropy__astropy-14365 - 2024-10-07 15:21:24,839 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=32191): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffda1610>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance astropy__astropy-7746 - 2024-10-07 15:21:25,875 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=37017): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffd07510>: Failed to establish a new connection: [Errno 111] Conn

Stack trace:

----------[The above error occurred. Retrying... (attempt 3 of 5)]----------

Instance django__django-11001 - 2024-10-07 15:16:53,257 - WARNING - Action, ErrorObservation loop detected
Instance django__django-11001 - 2024-10-07 15:16:53,259 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=38197): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfb94fcd10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance django__django-11001 - 2024-10-07 15:16:53,261 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=38197): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffcdc610>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance django__django-11001 - 2024-10-07 15:16:53,261 - ERROR - ----------
Error in instance [django__django-11001]: 'ErrorObservation' object has no attribute 'exit_code'. Stacktrace:
Traceback (most recent call last):
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/utils/shared.py", line 268, in _process_instance_wrapper
    result = process_instance_func(instance, metadata, use_mp)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/swe_bench/run_infer.py", line 367, in process_instance
    return_val = complete_runtime(runtime, instance)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/swe_bench/run_infer.py", line 287, in complete_runtime
    assert obs.exit_code == 0
           ^^^^^^^^^^^^^
AttributeError: 'ErrorObservation' object has no attribute 'exit_code'

----------[The above error occurred. Retrying... (attempt 3 of 5)]----------

----------
Error in instance [django__django-11001]: 'ErrorObservation' object has no attribute 'exit_code'. Stacktrace:
Traceback (most recent call last):
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/utils/shared.py", line 268, in _process_instance_wrapper
    result = process_instance_func(instance, metadata, use_mp)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/swe_bench/run_infer.py", line 367, in process_instance
    return_val = complete_runtime(runtime, instance)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/swe_bench/run_infer.py", line 287, in complete_runtime
    assert obs.exit_code == 0
           ^^^^^^^^^^^^^
AttributeError: 'ErrorObservation' object has no attribute 'exit_code'

STDOUT logs at the beginning

Number of workers not specified, use default 16
Commit hash not specified, use current git commit
Agent not specified, use default CodeActAgent
MAX_ITER not specified, use default 30
USE_INSTANCE_IMAGE not specified, use default true
DATASET not specified, use default princeton-nlp/SWE-bench_Lite
SPLIT not specified, use default test
USE_INSTANCE_IMAGE: true
AGENT: CodeActAgent
AGENT_VERSION: v1.9
MODEL_CONFIG: eval_vllm_vela_mistral_large_2
DATASET: princeton-nlp/SWE-bench_Lite
SPLIT: test
USE_HINT_TEXT: false
EVAL_NOTE: v1.9-no-hint
14:48:49 - openhands:INFO: run_infer.py:93 - Using docker image prefix: docker.io/xingyaoww/
14:48:56 - openhands:INFO: run_infer.py:441 - Loaded dataset princeton-nlp/SWE-bench_Lite with split test
14:48:56 - openhands:INFO: utils.py:258 - Loading llm config from eval_vllm_vela_mistral_large_2
14:48:56 - openhands:INFO: shared.py:165 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe-bench-lite/CodeActAgent/mistral-large-instruct-2407_maxiter_30_N_v1.9-no-hint
14:48:56 - openhands:INFO: shared.py:181 - Metadata: {"agent_class": "CodeActAgent", "llm_config": {"model": "openai/mistral-large-instruct-2407", "api_key": "******", "base_url": "BASE_URL", "api_version": null, "embedding_model": "local", "embedding_base_url": null, "embedding_deployment_name": null, "aws_access_key_id": null, "aws_secret_access_key": null, "aws_region_name": null, "openrouter_site_url": "https://docs.all-hands.dev/", "openrouter_app_name": "OpenHands", "num_retries": 8, "retry_multiplier": 2, "retry_min_wait": 15, "retry_max_wait": 120, "timeout": null, "max_message_chars": 10000, "temperature": 0.0, "top_p": 1.0, "custom_llm_provider": null, "max_input_tokens": null, "max_output_tokens": null, "input_cost_per_token": null, "output_cost_per_token": null, "ollama_base_url": null, "drop_params": true, "disable_vision": null, "caching_prompt": true, "log_completions": false}, "max_iterations": 30, "eval_output_dir": "evaluation/evaluation_outputs/outputs/swe-bench-lite/CodeActAgent/mistral-large-instruct-2407_maxiter_30_N_v1.9-no-hint", "start_time": "2024-10-07 14:48:56", "git_commit": "dd228c07e05b6908bc1d15dde8f8025284a9ef47", "dataset": "swe-bench-lite", "data_split": null, "details": {}}
14:48:56 - openhands:INFO: shared.py:199 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe-bench-lite/CodeActAgent/mistral-large-instruct-2407_maxiter_30_N_v1.9-no-hint/output.jsonl
14:48:56 - openhands:INFO: shared.py:232 - Finished instances: 0, Remaining instances: 300

Current OpenHands version

Commit - dd228c07e05b6908bc1d15dde8f8025284a9ef47

Installation and Configuration

> ./evaluation/swe_bench/scripts/run_infer.sh MODEL_CONFIG
Number of workers not specified, use default 16
Commit hash not specified, use current git commit
Agent not specified, use default CodeActAgent
MAX_ITER not specified, use default 30
USE_INSTANCE_IMAGE not specified, use default true
DATASET not specified, use default princeton-nlp/SWE-bench_Lite
SPLIT not specified, use default test
USE_INSTANCE_IMAGE: true
AGENT: CodeActAgent
AGENT_VERSION: v1.9
MODEL_CONFIG: MODEL_CONFIG
DATASET: princeton-nlp/SWE-bench_Lite
SPLIT: test
USE_HINT_TEXT: false
EVAL_NOTE: v1.9-no-hint


### Model and Agent

_No response_

### Operating System

_No response_

### Reproduction Steps

_No response_

### Logs, Errors, Screenshots, and Additional Context

_No response_
@jatinganhotra jatinganhotra added the bug Something isn't working label Oct 8, 2024
@xingyaoww
Copy link
Contributor

Yes - i think that's somewhat expected behavior - docker acts weirdly when you try to run multiple images at once.

You can consider join our eval channel #remote-runtime-limited-beta to get access to our new infra for eval in parallel: https://www.all-hands.dev/blog/evaluation-of-llms-as-coding-agents-on-swe-bench-at-30x-speed

@mamoodi
Copy link
Collaborator

mamoodi commented Oct 8, 2024

@xingyaoww just to clarify, when you say this is expected behavior, do you mean this will likely not be fixed?
In the README: https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation/swe_bench
It specifically allows you to set number of workers

@xingyaoww
Copy link
Contributor

Yeah i think so - maybe we should make this clearer on the README there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants