This commit is contained in:
Paul Gauthier 2024-05-25 12:47:47 -07:00
parent 3c970e0fb7
commit 50426e62da

View file

@ -13,7 +13,7 @@ achieving a state-of-the-art result.
The current top leaderboard entry is 20.3% The current top leaderboard entry is 20.3%
from Amazon Q Developer Agent. from Amazon Q Developer Agent.
The best result reported elsewhere seems to be The best result reported elsewhere seems to be
[25% from OpenDevin](https://x.com/gneubig/status/1791498953709752405) [25% from OpenDevin](https://x.com/gneubig/status/1791498953709752405).
[![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg) [![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg)
@ -399,9 +399,9 @@ making it faster, easier, and more reliable to run the acceptance tests.
Below are the references for the SWE-Bench Lite results Below are the references for the SWE-Bench Lite results
displayed in the graph at the top of this page. displayed in the graph at the top of this page.
- 25.0% OpenDevin https://x.com/gneubig/status/1791498953709752405 - [25.0% OpenDevin](https://x.com/gneubig/status/1791498953709752405)
- 22.3% AutoCodeRover https://github.com/nus-apr/auto-code-rover - [22.3% AutoCodeRover](https://github.com/nus-apr/auto-code-rover)
- 20.3% Amazon Q Developer Agent (v20240430-dev) https://www.swebench.com - [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com)
- 18.0% SWE-Agent + GPT-4 https://www.swebench.com - [18.0% SWE-Agent + GPT-4](https://www.swebench.com)
- 11.7% SWE-Agent + Opus https://www.swebench.com - [11.7% SWE-Agent + Opus](https://www.swebench.com)