From 50426e62da0f798ed19fb112bd7a7cf5fe943a25 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Sat, 25 May 2024 12:47:47 -0700 Subject: [PATCH] copy --- _posts/2024-05-22-swe-bench-lite.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_posts/2024-05-22-swe-bench-lite.md b/_posts/2024-05-22-swe-bench-lite.md index e7e05815f..b53bfafbc 100644 --- a/_posts/2024-05-22-swe-bench-lite.md +++ b/_posts/2024-05-22-swe-bench-lite.md @@ -13,7 +13,7 @@ achieving a state-of-the-art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere seems to be -[25% from OpenDevin](https://x.com/gneubig/status/1791498953709752405) +[25% from OpenDevin](https://x.com/gneubig/status/1791498953709752405). [![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg) @@ -399,9 +399,9 @@ making it faster, easier, and more reliable to run the acceptance tests. Below are the references for the SWE-Bench Lite results displayed in the graph at the top of this page. -- 25.0% OpenDevin https://x.com/gneubig/status/1791498953709752405 -- 22.3% AutoCodeRover https://github.com/nus-apr/auto-code-rover -- 20.3% Amazon Q Developer Agent (v20240430-dev) https://www.swebench.com -- 18.0% SWE-Agent + GPT-4 https://www.swebench.com -- 11.7% SWE-Agent + Opus https://www.swebench.com +- [25.0% OpenDevin](https://x.com/gneubig/status/1791498953709752405) +- [22.3% AutoCodeRover](https://github.com/nus-apr/auto-code-rover) +- [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com) +- [18.0% SWE-Agent + GPT-4](https://www.swebench.com) +- [11.7% SWE-Agent + Opus](https://www.swebench.com)