This commit is contained in:
Paul Gauthier 2024-06-03 11:16:34 -07:00
parent 0d06364db6
commit b184ab9977
2 changed files with 2 additions and 2 deletions

View file

@ -418,7 +418,7 @@ This is contrast to a pass@N result for N>1, where N attempts are made
and all N solutions are evaluated by the acceptance tests.
If *any* of the N solution pass, that counts as a pass@N success.
Below are the references for the pass@1 unhinted SWE-Bench results
Below are the references for the other pass@1 unhinted SWE-Bench results
displayed in the graph at the beginning of this article.
- [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com)

View file

@ -242,7 +242,7 @@ This is contrast to a pass@N result for N>1, where N attempts are made
and all N solutions are evaluated by the acceptance tests.
If *any* of the N solution pass, that counts as a pass@N success.
Below are the references for the pass@1 unhinted SWE-Bench results
Below are the references for the other pass@1 unhinted SWE-Bench results
displayed in the graph at the beginning of this article.
- [13.9% Devin, benchmarked on 570 instances.](https://www.cognition.ai/post/swe-bench-technical-report)