We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have run the evaluation script with --predictions_path gold on the 500 tasks in SWE-bench_Verified and 14 of them are failing.
--predictions_path gold
I'm using the current main branch of swebench: c63a113
This is the exact command:
python -m swebench.harness.run_evaluation --predictions_path gold --max_workers 25 --run_id validate-gold-verified --dataset_name princeton-nlp/SWE-bench_Verified --cache_level instance
Those are the unresolved task ids:
"unresolved_ids": [ "astropy__astropy-7166", "astropy__astropy-7336", "astropy__astropy-7606", "astropy__astropy-7671", "astropy__astropy-8707", "astropy__astropy-8872", "django__django-10097", "matplotlib__matplotlib-20488", "psf__requests-2317", "pylint-dev__pylint-6528", "pylint-dev__pylint-7080", "pylint-dev__pylint-7277", "sphinx-doc__sphinx-10323", "sphinx-doc__sphinx-10435" ],
Then, I ran it a second time, and got 15 unresolved tasks:
"unresolved_ids": [ "astropy__astropy-7166", "astropy__astropy-7336", "astropy__astropy-7606", "astropy__astropy-7671", "astropy__astropy-8707", "astropy__astropy-8872", "django__django-10097", "matplotlib__matplotlib-20488", "psf__requests-1766", "psf__requests-2317", "pylint-dev__pylint-6528", "pylint-dev__pylint-7080", "pylint-dev__pylint-7277", "sphinx-doc__sphinx-10323", "sphinx-doc__sphinx-10435" ],
No response
The text was updated successfully, but these errors were encountered:
This might be related to #225, #167, #246, #267, and #274
Sorry, something went wrong.
No branches or pull requests
Describe the issue
I have run the evaluation script with
--predictions_path gold
on the 500 tasks in SWE-bench_Verified and 14 of them are failing.I'm using the current main branch of swebench: c63a113
This is the exact command:
Those are the unresolved task ids:
Then, I ran it a second time, and got 15 unresolved tasks:
Suggest an improvement to documentation
No response
The text was updated successfully, but these errors were encountered: