From 0d06364db61ce658d0e0be7e7f605c02716380d1 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Mon, 3 Jun 2024 11:14:17 -0700 Subject: [PATCH] copy --- _posts/2024-05-22-swe-bench-lite.md | 4 ++-- _posts/2024-06-02-main-swe-bench.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_posts/2024-05-22-swe-bench-lite.md b/_posts/2024-05-22-swe-bench-lite.md index b87d57819..de9a52619 100644 --- a/_posts/2024-05-22-swe-bench-lite.md +++ b/_posts/2024-05-22-swe-bench-lite.md @@ -15,7 +15,7 @@ from Amazon Q Developer Agent. [![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg) -**To be clear, all of aider's results reported here are pass@1 results, +**All of aider's results reported here are pass@1 results, obtained without using the SWE Bench `hints_text`.** All results in the above chart are unhinted pass@1 results. Please see the [references](#references) @@ -407,7 +407,7 @@ making it faster, easier, and more reliable to run the acceptance tests. ## References -To be clear, all of aider's results reported here are pass@1 results, +All of aider's results reported here are pass@1 results, obtained without using the SWE Bench `hints_text`. The "aider agent" internally makes multiple "attempts" at solving the problem, diff --git a/_posts/2024-06-02-main-swe-bench.md b/_posts/2024-06-02-main-swe-bench.md index 0d72da5ac..0d2e05d55 100644 --- a/_posts/2024-06-02-main-swe-bench.md +++ b/_posts/2024-06-02-main-swe-bench.md @@ -20,7 +20,7 @@ This result on the main SWE Bench builds on [![SWE Bench results](/assets/swe_bench.svg)](https://aider.chat/assets/swe_bench.svg) -**To be clear, all of aider's results reported here are pass@1 results, +**All of aider's results reported here are pass@1 results, obtained without using the SWE Bench `hints_text`.** Aider was benchmarked on the same [570 randomly selected SWE Bench problems](https://github.com/CognitionAI/devin-swebench-results/tree/main/output_diffs) @@ -231,7 +231,7 @@ making it faster, easier, and more reliable to run the acceptance tests. ## References -To be clear, all of aider's results reported here are pass@1 results, +All of aider's results reported here are pass@1 results, obtained without using the SWE Bench `hints_text`. The "aider agent" internally makes multiple "attempts" at solving the problem,