From b184ab99774a4e7de51c5edfc16da69865468314 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Mon, 3 Jun 2024 11:16:34 -0700 Subject: [PATCH] copy --- _posts/2024-05-22-swe-bench-lite.md | 2 +- _posts/2024-06-02-main-swe-bench.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2024-05-22-swe-bench-lite.md b/_posts/2024-05-22-swe-bench-lite.md index de9a52619..d346dc9d1 100644 --- a/_posts/2024-05-22-swe-bench-lite.md +++ b/_posts/2024-05-22-swe-bench-lite.md @@ -418,7 +418,7 @@ This is contrast to a pass@N result for N>1, where N attempts are made and all N solutions are evaluated by the acceptance tests. If *any* of the N solution pass, that counts as a pass@N success. -Below are the references for the pass@1 unhinted SWE-Bench results +Below are the references for the other pass@1 unhinted SWE-Bench results displayed in the graph at the beginning of this article. - [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com) diff --git a/_posts/2024-06-02-main-swe-bench.md b/_posts/2024-06-02-main-swe-bench.md index 0d2e05d55..4a9970fca 100644 --- a/_posts/2024-06-02-main-swe-bench.md +++ b/_posts/2024-06-02-main-swe-bench.md @@ -242,7 +242,7 @@ This is contrast to a pass@N result for N>1, where N attempts are made and all N solutions are evaluated by the acceptance tests. If *any* of the N solution pass, that counts as a pass@N success. -Below are the references for the pass@1 unhinted SWE-Bench results +Below are the references for the other pass@1 unhinted SWE-Bench results displayed in the graph at the beginning of this article. - [13.9% Devin, benchmarked on 570 instances.](https://www.cognition.ai/post/swe-bench-technical-report)