diff --git a/_posts/2024-05-22-swe-bench-lite.md b/_posts/2024-05-22-swe-bench-lite.md index de9a52619..d346dc9d1 100644 --- a/_posts/2024-05-22-swe-bench-lite.md +++ b/_posts/2024-05-22-swe-bench-lite.md @@ -418,7 +418,7 @@ This is contrast to a pass@N result for N>1, where N attempts are made and all N solutions are evaluated by the acceptance tests. If *any* of the N solution pass, that counts as a pass@N success. -Below are the references for the pass@1 unhinted SWE-Bench results +Below are the references for the other pass@1 unhinted SWE-Bench results displayed in the graph at the beginning of this article. - [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com) diff --git a/_posts/2024-06-02-main-swe-bench.md b/_posts/2024-06-02-main-swe-bench.md index 0d2e05d55..4a9970fca 100644 --- a/_posts/2024-06-02-main-swe-bench.md +++ b/_posts/2024-06-02-main-swe-bench.md @@ -242,7 +242,7 @@ This is contrast to a pass@N result for N>1, where N attempts are made and all N solutions are evaluated by the acceptance tests. If *any* of the N solution pass, that counts as a pass@N success. -Below are the references for the pass@1 unhinted SWE-Bench results +Below are the references for the other pass@1 unhinted SWE-Bench results displayed in the graph at the beginning of this article. - [13.9% Devin, benchmarked on 570 instances.](https://www.cognition.ai/post/swe-bench-technical-report)