Updated graph to use pass@1 unhinted results from other agents

This commit is contained in:
Paul Gauthier 2024-05-30 15:29:33 -07:00
parent 07731e30dc
commit 6966936316
5 changed files with 435 additions and 213 deletions

View file

@ -12,11 +12,17 @@ on the
achieving a state-of-the-art result. achieving a state-of-the-art result.
The current top leaderboard entry is 20.3% The current top leaderboard entry is 20.3%
from Amazon Q Developer Agent. from Amazon Q Developer Agent.
The best result reported elsewhere seems to be
[25% from OpenDevin](https://x.com/gneubig/status/1791498953709752405).
[![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg) [![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg)
Please see the [references](#references)
for details on the data presented in this chart.
It was updated 5/30/24 to reflect apples-to-apples comparisons,
using pass@1 results from AutoCodeRover
and results from OpenDevin that don't use hints.
The [official SWE Bench Lite leaderboard](https://www.swebench.com)
only accepts pass@1 results that do not use hints.
## Interactive, not agentic ## Interactive, not agentic
Aider achieved this result mainly through its existing features that focus on static code analysis, reliable LLM code editing, and pragmatic UX for AI pair programming. Aider achieved this result mainly through its existing features that focus on static code analysis, reliable LLM code editing, and pragmatic UX for AI pair programming.
@ -397,14 +403,33 @@ making it faster, easier, and more reliable to run the acceptance tests.
## References ## References
Below are the references for the SWE-Bench Lite results Below are the references for the SWE-Bench Lite results
displayed in the graph at the top of this page. displayed in the graph at the beginning of this article.
- [25.0% OpenDevin](https://x.com/gneubig/status/1791498953709752405)
- [19.0% AutoCodeRover](https://github.com/swe-bench/experiments/pull/11)
- [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com) - [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com)
- [19.0% AutoCodeRover](https://github.com/swe-bench/experiments/pull/11)
- [18.0% SWE-Agent + GPT-4](https://www.swebench.com) - [18.0% SWE-Agent + GPT-4](https://www.swebench.com)
- [16.7% OpenDevin](https://github.com/OpenDevin/OpenDevin/issues/2149)
- [11.7% SWE-Agent + Opus](https://www.swebench.com) - [11.7% SWE-Agent + Opus](https://www.swebench.com)
Note: Graph updated on 5/30/24 to accurately reflect AutoCodeRover's pass@1 results. Note, the graph was updated on 5/30/24 as follows.
The previous graph contained their pass@3 result, which is not comparable
to the aider results being reported here. The graph now contains AutoCodeRover's pass@1 results.
Previously it was reporting the pass@3 results, which are
not comparable
to the pass@1 aider results being reported here.
The [AutoCodeRover GitHub page](https://github.com/nus-apr/auto-code-rover)
features the pass@3 results
without being clearly labeled.
The graph now contains the best OpenDevin results obtained without using
the `hints_text` to provide hints to the agent.
The previous graph contained their hinted result,
which is not comparable
to the unhinted aider results being reported here.
OpenDevin's [hinted result was reported](https://x.com/gneubig/status/1791498953709752405)
without noting that hints were used.
The [official SWE Bench Lite leaderboard](https://www.swebench.com)
only accepts pass@1 results that do not use `hints_text`.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

After

Width:  |  Height:  |  Size: 37 KiB

Before After
Before After

View file

@ -6,7 +6,7 @@
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cc:Work> <cc:Work>
<dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
<dc:date>2024-05-30T09:44:47.592823</dc:date> <dc:date>2024-05-30T15:26:12.767905</dc:date>
<dc:format>image/svg+xml</dc:format> <dc:format>image/svg+xml</dc:format>
<dc:creator> <dc:creator>
<cc:Agent> <cc:Agent>
@ -41,12 +41,12 @@ z
<g id="xtick_1"> <g id="xtick_1">
<g id="line2d_1"> <g id="line2d_1">
<defs> <defs>
<path id="m9e2e785105" d="M 0 0 <path id="m35e60885de" d="M 0 0
L 0 3.5 L 0 3.5
" style="stroke: #000000; stroke-width: 0.8"/> " style="stroke: #000000; stroke-width: 0.8"/>
</defs> </defs>
<g> <g>
<use xlink:href="#m9e2e785105" x="130.142981" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m35e60885de" x="130.142981" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_1"> <g id="text_1">
@ -412,19 +412,89 @@ z
<g id="xtick_2"> <g id="xtick_2">
<g id="line2d_2"> <g id="line2d_2">
<g> <g>
<use xlink:href="#m9e2e785105" x="213.207821" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m35e60885de" x="213.207821" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_2"> <g id="text_2">
<!-- Open -->
<g style="fill: #555555" transform="translate(193.639071 292.49025) scale(0.16 -0.16)">
<use xlink:href="#Helvetica-4f"/>
<use xlink:href="#Helvetica-70" x="77.783203"/>
<use xlink:href="#Helvetica-65" x="133.398438"/>
<use xlink:href="#Helvetica-6e" x="189.013672"/>
</g>
<!-- Devin -->
<g style="fill: #555555" transform="translate(192.755321 309.59825) scale(0.16 -0.16)">
<defs>
<path id="Helvetica-44" d="M 2250 531
Q 2566 531 2769 597
Q 3131 719 3363 1066
Q 3547 1344 3628 1778
Q 3675 2038 3675 2259
Q 3675 3113 3336 3584
Q 2997 4056 2244 4056
L 1141 4056
L 1141 531
L 2250 531
z
M 516 4591
L 2375 4591
Q 3322 4591 3844 3919
Q 4309 3313 4309 2366
Q 4309 1634 4034 1044
Q 3550 0 2369 0
L 516 0
L 516 4591
z
" transform="scale(0.015625)"/>
<path id="Helvetica-76" d="M 688 3347
L 1581 622
L 2516 3347
L 3131 3347
L 1869 0
L 1269 0
L 34 3347
L 688 3347
z
" transform="scale(0.015625)"/>
<path id="Helvetica-69" d="M 413 3331
L 984 3331
L 984 0
L 413 0
L 413 3331
z
M 413 4591
L 984 4591
L 984 3953
L 413 3953
L 413 4591
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#Helvetica-44"/>
<use xlink:href="#Helvetica-65" x="72.216797"/>
<use xlink:href="#Helvetica-76" x="127.832031"/>
<use xlink:href="#Helvetica-69" x="177.832031"/>
<use xlink:href="#Helvetica-6e" x="200.048828"/>
</g>
</g>
</g>
<g id="xtick_3">
<g id="line2d_3">
<g>
<use xlink:href="#m35e60885de" x="296.27266" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g>
</g>
<g id="text_3">
<!-- SWE- --> <!-- SWE- -->
<g style="fill: #555555" transform="translate(192.320321 292.17775) scale(0.16 -0.16)"> <g style="fill: #555555" transform="translate(275.38516 292.17775) scale(0.16 -0.16)">
<use xlink:href="#Helvetica-53"/> <use xlink:href="#Helvetica-53"/>
<use xlink:href="#Helvetica-57" x="66.699219"/> <use xlink:href="#Helvetica-57" x="66.699219"/>
<use xlink:href="#Helvetica-45" x="161.083984"/> <use xlink:href="#Helvetica-45" x="161.083984"/>
<use xlink:href="#Helvetica-2d" x="227.783203"/> <use xlink:href="#Helvetica-2d" x="227.783203"/>
</g> </g>
<!-- Agent --> <!-- Agent -->
<g style="fill: #555555" transform="translate(192.302821 309.28575) scale(0.16 -0.16)"> <g style="fill: #555555" transform="translate(275.36766 309.28575) scale(0.16 -0.16)">
<use xlink:href="#Helvetica-41"/> <use xlink:href="#Helvetica-41"/>
<use xlink:href="#Helvetica-67" x="66.699219"/> <use xlink:href="#Helvetica-67" x="66.699219"/>
<use xlink:href="#Helvetica-65" x="122.314453"/> <use xlink:href="#Helvetica-65" x="122.314453"/>
@ -432,7 +502,7 @@ z
<use xlink:href="#Helvetica-74" x="233.544922"/> <use xlink:href="#Helvetica-74" x="233.544922"/>
</g> </g>
<!-- + GPT-4 --> <!-- + GPT-4 -->
<g style="fill: #555555" transform="translate(182.755321 326.59875) scale(0.16 -0.16)"> <g style="fill: #555555" transform="translate(265.82016 326.59875) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-47" d="M 2472 4709 <path id="Helvetica-47" d="M 2472 4709
Q 3119 4709 3591 4459 Q 3119 4709 3591 4459
@ -525,15 +595,15 @@ z
</g> </g>
</g> </g>
</g> </g>
<g id="xtick_3"> <g id="xtick_4">
<g id="line2d_3"> <g id="line2d_4">
<g> <g>
<use xlink:href="#m9e2e785105" x="296.27266" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m35e60885de" x="379.3375" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_3"> <g id="text_4">
<!-- AutoCode --> <!-- AutoCode -->
<g style="fill: #555555" transform="translate(260.69266 292.17775) scale(0.16 -0.16)"> <g style="fill: #555555" transform="translate(343.7575 292.17775) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-6f" d="M 1741 363 <path id="Helvetica-6f" d="M 1741 363
Q 2300 363 2508 786 Q 2300 363 2508 786
@ -620,7 +690,7 @@ z
<use xlink:href="#Helvetica-65" x="389.160156"/> <use xlink:href="#Helvetica-65" x="389.160156"/>
</g> </g>
<!-- Rover --> <!-- Rover -->
<g style="fill: #555555" transform="translate(274.93391 309.28575) scale(0.16 -0.16)"> <g style="fill: #555555" transform="translate(357.99875 309.28575) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-52" d="M 2622 2488 <path id="Helvetica-52" d="M 2622 2488
Q 3059 2488 3314 2663 Q 3059 2488 3314 2663
@ -654,16 +724,6 @@ L 1184 0
L 563 0 L 563 0
L 563 4591 L 563 4591
z z
" transform="scale(0.015625)"/>
<path id="Helvetica-76" d="M 688 3347
L 1581 622
L 2516 3347
L 3131 3347
L 1869 0
L 1269 0
L 34 3347
L 688 3347
z
" transform="scale(0.015625)"/> " transform="scale(0.015625)"/>
<path id="Helvetica-72" d="M 428 3347 <path id="Helvetica-72" d="M 428 3347
L 963 3347 L 963 3347
@ -691,15 +751,15 @@ z
</g> </g>
</g> </g>
</g> </g>
<g id="xtick_4"> <g id="xtick_5">
<g id="line2d_4"> <g id="line2d_5">
<g> <g>
<use xlink:href="#m9e2e785105" x="379.3375" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m35e60885de" x="462.40234" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_4"> <g id="text_5">
<!-- Amazon Q --> <!-- Amazon Q -->
<g style="fill: #555555" transform="translate(341.54625 292.17775) scale(0.16 -0.16)"> <g style="fill: #555555" transform="translate(424.61109 292.17775) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-6d" d="M 413 3347 <path id="Helvetica-6d" d="M 413 3347
L 969 3347 L 969 3347
@ -835,29 +895,8 @@ z
<use xlink:href="#Helvetica-51" x="394.628906"/> <use xlink:href="#Helvetica-51" x="394.628906"/>
</g> </g>
<!-- Developer --> <!-- Developer -->
<g style="fill: #555555" transform="translate(342.875 309.28575) scale(0.16 -0.16)"> <g style="fill: #555555" transform="translate(425.93984 309.28575) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-44" d="M 2250 531
Q 2566 531 2769 597
Q 3131 719 3363 1066
Q 3547 1344 3628 1778
Q 3675 2038 3675 2259
Q 3675 3113 3336 3584
Q 2997 4056 2244 4056
L 1141 4056
L 1141 531
L 2250 531
z
M 516 4591
L 2375 4591
Q 3322 4591 3844 3919
Q 4309 3313 4309 2366
Q 4309 1634 4034 1044
Q 3550 0 2369 0
L 516 0
L 516 4591
z
" transform="scale(0.015625)"/>
<path id="Helvetica-6c" d="M 428 4591 <path id="Helvetica-6c" d="M 428 4591
L 991 4591 L 991 4591
L 991 0 L 991 0
@ -877,7 +916,7 @@ z
<use xlink:href="#Helvetica-72" x="422.509766"/> <use xlink:href="#Helvetica-72" x="422.509766"/>
</g> </g>
<!-- Agent --> <!-- Agent -->
<g style="fill: #555555" transform="translate(358.4325 326.39375) scale(0.16 -0.16)"> <g style="fill: #555555" transform="translate(441.49734 326.39375) scale(0.16 -0.16)">
<use xlink:href="#Helvetica-41"/> <use xlink:href="#Helvetica-41"/>
<use xlink:href="#Helvetica-67" x="66.699219"/> <use xlink:href="#Helvetica-67" x="66.699219"/>
<use xlink:href="#Helvetica-65" x="122.314453"/> <use xlink:href="#Helvetica-65" x="122.314453"/>
@ -886,49 +925,10 @@ z
</g> </g>
</g> </g>
</g> </g>
<g id="xtick_5">
<g id="line2d_5">
<g>
<use xlink:href="#m9e2e785105" x="462.40234" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g>
</g>
<g id="text_5">
<!-- Open -->
<g style="fill: #555555" transform="translate(442.83359 292.49025) scale(0.16 -0.16)">
<use xlink:href="#Helvetica-4f"/>
<use xlink:href="#Helvetica-70" x="77.783203"/>
<use xlink:href="#Helvetica-65" x="133.398438"/>
<use xlink:href="#Helvetica-6e" x="189.013672"/>
</g>
<!-- Devin -->
<g style="fill: #555555" transform="translate(441.94984 309.59825) scale(0.16 -0.16)">
<defs>
<path id="Helvetica-69" d="M 413 3331
L 984 3331
L 984 0
L 413 0
L 413 3331
z
M 413 4591
L 984 4591
L 984 3953
L 413 3953
L 413 4591
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#Helvetica-44"/>
<use xlink:href="#Helvetica-65" x="72.216797"/>
<use xlink:href="#Helvetica-76" x="127.832031"/>
<use xlink:href="#Helvetica-69" x="177.832031"/>
<use xlink:href="#Helvetica-6e" x="200.048828"/>
</g>
</g>
</g>
<g id="xtick_6"> <g id="xtick_6">
<g id="line2d_6"> <g id="line2d_6">
<g> <g>
<use xlink:href="#m9e2e785105" x="545.467179" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m35e60885de" x="545.467179" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_6"> <g id="text_6">
@ -954,7 +954,7 @@ z
<g id="xtick_7"> <g id="xtick_7">
<g id="line2d_7"> <g id="line2d_7">
<g> <g>
<use xlink:href="#m9e2e785105" x="628.532019" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m35e60885de" x="628.532019" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_7"> <g id="text_7">
@ -1039,16 +1039,16 @@ z
<g id="line2d_8"> <g id="line2d_8">
<path d="M 68.675 273.70025 <path d="M 68.675 273.70025
L 690 273.70025 L 690 273.70025
" clip-path="url(#pc190475179)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#pec64ca441b)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_9"> <g id="line2d_9">
<defs> <defs>
<path id="m51a8650cd6" d="M 0 0 <path id="m25e4681e23" d="M 0 0
L -3.5 0 L -3.5 0
" style="stroke: #000000; stroke-width: 0.8"/> " style="stroke: #000000; stroke-width: 0.8"/>
</defs> </defs>
<g> <g>
<use xlink:href="#m51a8650cd6" x="68.675" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m25e4681e23" x="68.675" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_8"> <g id="text_8">
@ -1083,18 +1083,18 @@ z
</g> </g>
<g id="ytick_2"> <g id="ytick_2">
<g id="line2d_10"> <g id="line2d_10">
<path d="M 68.675 236.359071 <path d="M 68.675 233.26928
L 690 236.359071 L 690 233.26928
" clip-path="url(#pc190475179)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#pec64ca441b)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_11"> <g id="line2d_11">
<g> <g>
<use xlink:href="#m51a8650cd6" x="68.675" y="236.359071" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m25e4681e23" x="68.675" y="233.26928" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_9"> <g id="text_9">
<!-- 5 --> <!-- 5 -->
<g transform="translate(56.114063 239.94579) scale(0.1 -0.1)"> <g transform="translate(56.114063 236.855998) scale(0.1 -0.1)">
<defs> <defs>
<path id="Helvetica-35" d="M 791 1141 <path id="Helvetica-35" d="M 791 1141
Q 847 659 1238 475 Q 847 659 1238 475
@ -1129,18 +1129,18 @@ z
</g> </g>
<g id="ytick_3"> <g id="ytick_3">
<g id="line2d_12"> <g id="line2d_12">
<path d="M 68.675 199.017892 <path d="M 68.675 192.838309
L 690 199.017892 L 690 192.838309
" clip-path="url(#pc190475179)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#pec64ca441b)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_13"> <g id="line2d_13">
<g> <g>
<use xlink:href="#m51a8650cd6" x="68.675" y="199.017892" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m25e4681e23" x="68.675" y="192.838309" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_10"> <g id="text_10">
<!-- 10 --> <!-- 10 -->
<g transform="translate(50.553125 202.604611) scale(0.1 -0.1)"> <g transform="translate(50.553125 196.425028) scale(0.1 -0.1)">
<defs> <defs>
<path id="Helvetica-31" d="M 613 3169 <path id="Helvetica-31" d="M 613 3169
L 613 3600 L 613 3600
@ -1161,18 +1161,18 @@ z
</g> </g>
<g id="ytick_4"> <g id="ytick_4">
<g id="line2d_14"> <g id="line2d_14">
<path d="M 68.675 161.676713 <path d="M 68.675 152.407339
L 690 161.676713 L 690 152.407339
" clip-path="url(#pc190475179)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#pec64ca441b)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_15"> <g id="line2d_15">
<g> <g>
<use xlink:href="#m51a8650cd6" x="68.675" y="161.676713" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m25e4681e23" x="68.675" y="152.407339" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_11"> <g id="text_11">
<!-- 15 --> <!-- 15 -->
<g transform="translate(50.553125 165.263432) scale(0.1 -0.1)"> <g transform="translate(50.553125 155.994057) scale(0.1 -0.1)">
<use xlink:href="#Helvetica-31"/> <use xlink:href="#Helvetica-31"/>
<use xlink:href="#Helvetica-35" x="55.615234"/> <use xlink:href="#Helvetica-35" x="55.615234"/>
</g> </g>
@ -1180,18 +1180,18 @@ L 690 161.676713
</g> </g>
<g id="ytick_5"> <g id="ytick_5">
<g id="line2d_16"> <g id="line2d_16">
<path d="M 68.675 124.335534 <path d="M 68.675 111.976368
L 690 124.335534 L 690 111.976368
" clip-path="url(#pc190475179)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#pec64ca441b)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_17"> <g id="line2d_17">
<g> <g>
<use xlink:href="#m51a8650cd6" x="68.675" y="124.335534" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m25e4681e23" x="68.675" y="111.976368" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_12"> <g id="text_12">
<!-- 20 --> <!-- 20 -->
<g transform="translate(50.553125 127.922253) scale(0.1 -0.1)"> <g transform="translate(50.553125 115.563087) scale(0.1 -0.1)">
<defs> <defs>
<path id="Helvetica-32" d="M 200 0 <path id="Helvetica-32" d="M 200 0
Q 231 578 439 1006 Q 231 578 439 1006
@ -1226,18 +1226,18 @@ z
</g> </g>
<g id="ytick_6"> <g id="ytick_6">
<g id="line2d_18"> <g id="line2d_18">
<path d="M 68.675 86.994355 <path d="M 68.675 71.545398
L 690 86.994355 L 690 71.545398
" clip-path="url(#pc190475179)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#pec64ca441b)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_19"> <g id="line2d_19">
<g> <g>
<use xlink:href="#m51a8650cd6" x="68.675" y="86.994355" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m25e4681e23" x="68.675" y="71.545398" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_13"> <g id="text_13">
<!-- 25 --> <!-- 25 -->
<g transform="translate(50.553125 90.581074) scale(0.1 -0.1)"> <g transform="translate(50.553125 75.132116) scale(0.1 -0.1)">
<use xlink:href="#Helvetica-32"/> <use xlink:href="#Helvetica-32"/>
<use xlink:href="#Helvetica-35" x="55.615234"/> <use xlink:href="#Helvetica-35" x="55.615234"/>
</g> </g>
@ -1404,62 +1404,62 @@ L 690 50.4
<g id="patch_7"> <g id="patch_7">
<path d="M 96.917045 273.70025 <path d="M 96.917045 273.70025
L 163.368917 273.70025 L 163.368917 273.70025
L 163.368917 186.321891 L 163.368917 179.091779
L 96.917045 186.321891 L 96.917045 179.091779
z z
" clip-path="url(#pc190475179)" style="fill: #b3d1e6; opacity: 0.3"/> " clip-path="url(#pec64ca441b)" style="fill: #b3d1e6; opacity: 0.3"/>
</g> </g>
<g id="patch_8"> <g id="patch_8">
<path d="M 179.981885 273.70025 <path d="M 179.981885 273.70025
L 246.433757 273.70025 L 246.433757 273.70025
L 246.433757 139.272006 L 246.433757 138.660809
L 179.981885 139.272006 L 179.981885 138.660809
z z
" clip-path="url(#pc190475179)" style="fill: #b3d1e6; opacity: 0.3"/> " clip-path="url(#pec64ca441b)" style="fill: #b3d1e6; opacity: 0.3"/>
</g> </g>
<g id="patch_9"> <g id="patch_9">
<path d="M 263.046725 273.70025 <path d="M 263.046725 273.70025
L 329.498596 273.70025 L 329.498596 273.70025
L 329.498596 131.80377 L 329.498596 128.148756
L 263.046725 131.80377 L 263.046725 128.148756
z z
" clip-path="url(#pc190475179)" style="fill: #b3d1e6; opacity: 0.3"/> " clip-path="url(#pec64ca441b)" style="fill: #b3d1e6; opacity: 0.3"/>
</g> </g>
<g id="patch_10"> <g id="patch_10">
<path d="M 346.111564 273.70025 <path d="M 346.111564 273.70025
L 412.563436 273.70025 L 412.563436 273.70025
L 412.563436 122.095064 L 412.563436 120.062562
L 346.111564 122.095064 L 346.111564 120.062562
z z
" clip-path="url(#pc190475179)" style="fill: #b3d1e6; opacity: 0.3"/> " clip-path="url(#pec64ca441b)" style="fill: #b3d1e6; opacity: 0.3"/>
</g> </g>
<g id="patch_11"> <g id="patch_11">
<path d="M 429.176404 273.70025 <path d="M 429.176404 273.70025
L 495.628275 273.70025 L 495.628275 273.70025
L 495.628275 86.994355 L 495.628275 109.55051
L 429.176404 86.994355 L 429.176404 109.55051
z z
" clip-path="url(#pc190475179)" style="fill: #b3d1e6; opacity: 0.3"/> " clip-path="url(#pec64ca441b)" style="fill: #b3d1e6; opacity: 0.3"/>
</g> </g>
<g id="patch_12"> <g id="patch_12">
<path d="M 512.241243 273.70025 <path d="M 512.241243 273.70025
L 578.693115 273.70025 L 578.693115 273.70025
L 578.693115 86.994355 L 578.693115 71.545398
L 512.241243 86.994355 L 512.241243 71.545398
z z
" clip-path="url(#pc190475179)" style="fill: #17965a; opacity: 0.6"/> " clip-path="url(#pec64ca441b)" style="fill: #17965a; opacity: 0.9"/>
</g> </g>
<g id="patch_13"> <g id="patch_13">
<path d="M 595.306083 273.70025 <path d="M 595.306083 273.70025
L 661.757955 273.70025 L 661.757955 273.70025
L 661.757955 77.285649 L 661.757955 61.033345
L 595.306083 77.285649 L 595.306083 61.033345
z z
" clip-path="url(#pc190475179)" style="fill: #17965a; opacity: 0.6"/> " clip-path="url(#pec64ca441b)" style="fill: #17965a; opacity: 0.9"/>
</g> </g>
<g id="text_15"> <g id="text_15">
<!-- 11.7% --> <!-- 11.7% -->
<g style="fill: #555555" transform="translate(110.295794 205.699999) scale(0.14 -0.14)"> <g style="fill: #555555" transform="translate(107.460481 200.677022) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-2e" d="M 547 681 <path id="Helvetica-2e" d="M 547 681
L 1200 681 L 1200 681
@ -1491,8 +1491,50 @@ z
</g> </g>
</g> </g>
<g id="text_16"> <g id="text_16">
<!-- 16.7% -->
<g style="fill: #555555" transform="translate(190.525321 160.246051) scale(0.16 -0.16)">
<defs>
<path id="Helvetica-36" d="M 1872 4494
Q 2622 4494 2917 4105
Q 3213 3716 3213 3303
L 2656 3303
Q 2606 3569 2497 3719
Q 2294 4000 1881 4000
Q 1409 4000 1131 3564
Q 853 3128 822 2316
Q 1016 2600 1309 2741
Q 1578 2866 1909 2866
Q 2472 2866 2890 2506
Q 3309 2147 3309 1434
Q 3309 825 2912 354
Q 2516 -116 1781 -116
Q 1153 -116 697 361
Q 241 838 241 1966
Q 241 2800 444 3381
Q 834 4494 1872 4494
z
M 1831 384
Q 2275 384 2495 682
Q 2716 981 2716 1388
Q 2716 1731 2519 2042
Q 2322 2353 1803 2353
Q 1441 2353 1167 2112
Q 894 1872 894 1388
Q 894 963 1142 673
Q 1391 384 1831 384
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#Helvetica-31"/>
<use xlink:href="#Helvetica-36" x="55.615234"/>
<use xlink:href="#Helvetica-2e" x="111.230469"/>
<use xlink:href="#Helvetica-37" x="139.013672"/>
<use xlink:href="#Helvetica-25" x="194.628906"/>
</g>
</g>
<g id="text_17">
<!-- 18.0% --> <!-- 18.0% -->
<g style="fill: #555555" transform="translate(193.360633 158.650113) scale(0.14 -0.14)"> <g style="fill: #555555" transform="translate(273.59016 149.733999) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-38" d="M 1741 2600 <path id="Helvetica-38" d="M 1741 2600
Q 2113 2600 2322 2808 Q 2113 2600 2322 2808
@ -1541,9 +1583,9 @@ z
<use xlink:href="#Helvetica-25" x="194.628906"/> <use xlink:href="#Helvetica-25" x="194.628906"/>
</g> </g>
</g> </g>
<g id="text_17"> <g id="text_18">
<!-- 19.0% --> <!-- 19.0% -->
<g style="fill: #555555" transform="translate(276.425473 151.181877) scale(0.14 -0.14)"> <g style="fill: #555555" transform="translate(356.655 141.647805) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-39" d="M 850 1081 <path id="Helvetica-39" d="M 850 1081
Q 875 616 1209 438 Q 875 616 1209 438
@ -1583,9 +1625,9 @@ z
<use xlink:href="#Helvetica-25" x="194.628906"/> <use xlink:href="#Helvetica-25" x="194.628906"/>
</g> </g>
</g> </g>
<g id="text_18"> <g id="text_19">
<!-- 20.3% --> <!-- 20.3% -->
<g style="fill: #555555" transform="translate(359.490313 141.473171) scale(0.14 -0.14)"> <g style="fill: #555555" transform="translate(439.71984 131.135752) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-33" d="M 1663 -122 <path id="Helvetica-33" d="M 1663 -122
Q 869 -122 511 314 Q 869 -122 511 314
@ -1629,66 +1671,212 @@ z
<use xlink:href="#Helvetica-25" x="194.628906"/> <use xlink:href="#Helvetica-25" x="194.628906"/>
</g> </g>
</g> </g>
<g id="text_19">
<!-- 25.0% -->
<g style="fill: #555555" transform="translate(442.555152 106.372463) scale(0.14 -0.14)">
<use xlink:href="#Helvetica-32"/>
<use xlink:href="#Helvetica-35" x="55.615234"/>
<use xlink:href="#Helvetica-2e" x="111.230469"/>
<use xlink:href="#Helvetica-30" x="139.013672"/>
<use xlink:href="#Helvetica-25" x="194.628906"/>
</g>
</g>
<g id="text_20"> <g id="text_20">
<!-- 25.0% --> <!-- 25.0% -->
<g style="fill: #555555" transform="translate(525.619992 78.475054) scale(0.14 -0.14)"> <g style="fill: #eeeeee" transform="translate(517.713429 93.81064) scale(0.16 -0.16)">
<use xlink:href="#Helvetica-32"/> <defs>
<use xlink:href="#Helvetica-35" x="55.615234"/> <path id="DejaVuSans-Bold-32" d="M 1844 884
<use xlink:href="#Helvetica-2e" x="111.230469"/> L 3897 884
<use xlink:href="#Helvetica-30" x="139.013672"/> L 3897 0
<use xlink:href="#Helvetica-25" x="194.628906"/> L 506 0
L 506 884
L 2209 2388
Q 2438 2594 2547 2791
Q 2656 2988 2656 3200
Q 2656 3528 2436 3728
Q 2216 3928 1850 3928
Q 1569 3928 1234 3808
Q 900 3688 519 3450
L 519 4475
Q 925 4609 1322 4679
Q 1719 4750 2100 4750
Q 2938 4750 3402 4381
Q 3866 4013 3866 3353
Q 3866 2972 3669 2642
Q 3472 2313 2841 1759
L 1844 884
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-35" d="M 678 4666
L 3669 4666
L 3669 3781
L 1638 3781
L 1638 3059
Q 1775 3097 1914 3117
Q 2053 3138 2203 3138
Q 3056 3138 3531 2711
Q 4006 2284 4006 1522
Q 4006 766 3489 337
Q 2972 -91 2053 -91
Q 1656 -91 1267 -14
Q 878 63 494 219
L 494 1166
Q 875 947 1217 837
Q 1559 728 1863 728
Q 2300 728 2551 942
Q 2803 1156 2803 1522
Q 2803 1891 2551 2103
Q 2300 2316 1863 2316
Q 1603 2316 1309 2248
Q 1016 2181 678 2041
L 678 4666
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-2e" d="M 653 1209
L 1778 1209
L 1778 0
L 653 0
L 653 1209
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-30" d="M 2944 2338
Q 2944 3213 2780 3570
Q 2616 3928 2228 3928
Q 1841 3928 1675 3570
Q 1509 3213 1509 2338
Q 1509 1453 1675 1090
Q 1841 728 2228 728
Q 2613 728 2778 1090
Q 2944 1453 2944 2338
z
M 4147 2328
Q 4147 1169 3647 539
Q 3147 -91 2228 -91
Q 1306 -91 806 539
Q 306 1169 306 2328
Q 306 3491 806 4120
Q 1306 4750 2228 4750
Q 3147 4750 3647 4120
Q 4147 3491 4147 2328
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-25" d="M 4959 1925
Q 4738 1925 4616 1733
Q 4494 1541 4494 1184
Q 4494 825 4614 633
Q 4734 441 4959 441
Q 5184 441 5303 633
Q 5422 825 5422 1184
Q 5422 1541 5301 1733
Q 5181 1925 4959 1925
z
M 4959 2450
Q 5541 2450 5875 2112
Q 6209 1775 6209 1184
Q 6209 594 5875 251
Q 5541 -91 4959 -91
Q 4378 -91 4042 251
Q 3706 594 3706 1184
Q 3706 1772 4042 2111
Q 4378 2450 4959 2450
z
M 2094 -91
L 1403 -91
L 4319 4750
L 5013 4750
L 2094 -91
z
M 1453 4750
Q 2034 4750 2367 4411
Q 2700 4072 2700 3481
Q 2700 2891 2367 2550
Q 2034 2209 1453 2209
Q 872 2209 539 2550
Q 206 2891 206 3481
Q 206 4072 539 4411
Q 872 4750 1453 4750
z
M 1453 4225
Q 1228 4225 1106 4031
Q 984 3838 984 3481
Q 984 3122 1106 2926
Q 1228 2731 1453 2731
Q 1678 2731 1798 2926
Q 1919 3122 1919 3481
Q 1919 3838 1797 4031
Q 1675 4225 1453 4225
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-32"/>
<use xlink:href="#DejaVuSans-Bold-35" x="69.580078"/>
<use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
<use xlink:href="#DejaVuSans-Bold-30" x="177.148438"/>
<use xlink:href="#DejaVuSans-Bold-25" x="246.728516"/>
</g> </g>
</g> </g>
<g id="text_21"> <g id="text_21">
<!-- 26.3% --> <!-- 26.3% -->
<g style="fill: #555555" transform="translate(608.684831 68.766347) scale(0.14 -0.14)"> <g style="fill: #eeeeee" transform="translate(600.778269 83.298588) scale(0.16 -0.16)">
<defs> <defs>
<path id="Helvetica-36" d="M 1872 4494 <path id="DejaVuSans-Bold-36" d="M 2316 2303
Q 2622 4494 2917 4105 Q 2000 2303 1842 2098
Q 3213 3716 3213 3303 Q 1684 1894 1684 1484
L 2656 3303 Q 1684 1075 1842 870
Q 2606 3569 2497 3719 Q 2000 666 2316 666
Q 2294 4000 1881 4000 Q 2634 666 2792 870
Q 1409 4000 1131 3564 Q 2950 1075 2950 1484
Q 853 3128 822 2316 Q 2950 1894 2792 2098
Q 1016 2600 1309 2741 Q 2634 2303 2316 2303
Q 1578 2866 1909 2866
Q 2472 2866 2890 2506
Q 3309 2147 3309 1434
Q 3309 825 2912 354
Q 2516 -116 1781 -116
Q 1153 -116 697 361
Q 241 838 241 1966
Q 241 2800 444 3381
Q 834 4494 1872 4494
z z
M 1831 384 M 3803 4544
Q 2275 384 2495 682 L 3803 3681
Q 2716 981 2716 1388 Q 3506 3822 3243 3889
Q 2716 1731 2519 2042 Q 2981 3956 2731 3956
Q 2322 2353 1803 2353 Q 2194 3956 1894 3657
Q 1441 2353 1167 2112 Q 1594 3359 1544 2772
Q 894 1872 894 1388 Q 1750 2925 1990 3001
Q 894 963 1142 673 Q 2231 3078 2516 3078
Q 1391 384 1831 384 Q 3231 3078 3670 2659
Q 4109 2241 4109 1563
Q 4109 813 3618 361
Q 3128 -91 2303 -91
Q 1394 -91 895 523
Q 397 1138 397 2266
Q 397 3422 980 4083
Q 1563 4744 2578 4744
Q 2900 4744 3203 4694
Q 3506 4644 3803 4544
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-33" d="M 2981 2516
Q 3453 2394 3698 2092
Q 3944 1791 3944 1325
Q 3944 631 3412 270
Q 2881 -91 1863 -91
Q 1503 -91 1142 -33
Q 781 25 428 141
L 428 1069
Q 766 900 1098 814
Q 1431 728 1753 728
Q 2231 728 2486 893
Q 2741 1059 2741 1369
Q 2741 1688 2480 1852
Q 2219 2016 1709 2016
L 1228 2016
L 1228 2791
L 1734 2791
Q 2188 2791 2409 2933
Q 2631 3075 2631 3366
Q 2631 3634 2415 3781
Q 2200 3928 1806 3928
Q 1516 3928 1219 3862
Q 922 3797 628 3669
L 628 4550
Q 984 4650 1334 4700
Q 1684 4750 2022 4750
Q 2931 4750 3382 4451
Q 3834 4153 3834 3553
Q 3834 3144 3618 2883
Q 3403 2622 2981 2516
z z
" transform="scale(0.015625)"/> " transform="scale(0.015625)"/>
</defs> </defs>
<use xlink:href="#Helvetica-32"/> <use xlink:href="#DejaVuSans-Bold-32"/>
<use xlink:href="#Helvetica-36" x="55.615234"/> <use xlink:href="#DejaVuSans-Bold-36" x="69.580078"/>
<use xlink:href="#Helvetica-2e" x="111.230469"/> <use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
<use xlink:href="#Helvetica-33" x="139.013672"/> <use xlink:href="#DejaVuSans-Bold-33" x="177.148438"/>
<use xlink:href="#Helvetica-25" x="194.628906"/> <use xlink:href="#DejaVuSans-Bold-25" x="246.728516"/>
</g> </g>
</g> </g>
<g id="text_22"> <g id="text_22">
@ -1775,7 +1963,7 @@ z
</g> </g>
</g> </g>
<defs> <defs>
<clipPath id="pc190475179"> <clipPath id="pec64ca441b">
<rect x="68.675" y="50.4" width="621.325" height="223.30025"/> <rect x="68.675" y="50.4" width="621.325" height="223.30025"/>
</clipPath> </clipPath>
</defs> </defs>

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 47 KiB

Before After
Before After

View file

@ -1,7 +1,7 @@
26.3% Aider|GPT-4o|& Opus 26.3% Aider|GPT-4o|& Opus
25.0% Aider|GPT-4o 25.0% Aider|GPT-4o
25.0% Open|Devin
20.3% Amazon Q|Developer|Agent 20.3% Amazon Q|Developer|Agent
19.0% AutoCode|Rover 19.0% AutoCode|Rover
18.0% SWE-|Agent|+ GPT-4 18.0% SWE-|Agent|+ GPT-4
16.7% Open|Devin
11.7% SWE-|Agent|+ Opus 11.7% SWE-|Agent|+ Opus

View file

@ -22,7 +22,13 @@ def plot_swe_bench_lite(data_file):
plt.rcParams["hatch.color"] = "#444444" plt.rcParams["hatch.color"] = "#444444"
font_color = "#555" font_color = "#555"
rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10}) font_params = {
"family": "sans-serif",
"sans-serif": ["Helvetica"],
"size": 10,
"weight": "bold",
}
rc("font", **font_params)
plt.rcParams["text.color"] = font_color plt.rcParams["text.color"] = font_color
fig, ax = plt.subplots(figsize=(10, 5)) fig, ax = plt.subplots(figsize=(10, 5))
@ -34,28 +40,31 @@ def plot_swe_bench_lite(data_file):
colors = ["#17965A" if "Aider" in model else "#b3d1e6" for model in models] colors = ["#17965A" if "Aider" in model else "#b3d1e6" for model in models]
bars = [] bars = []
for model, pass_rate, color in zip(models, pass_rates, colors): for model, pass_rate, color in zip(models, pass_rates, colors):
alpha = 0.6 if "Aider" in model else 0.3 alpha = 0.9 if "Aider" in model else 0.3
bar = ax.bar(model, pass_rate, color=color, alpha=alpha, zorder=3) bar = ax.bar(model, pass_rate, color=color, alpha=alpha, zorder=3)
bars.append(bar[0]) bars.append(bar[0])
for model, bar in zip(models, bars): for model, bar in zip(models, bars):
yval = bar.get_height() yval = bar.get_height()
y = yval + 0.75 if "Aider" in model else yval - 1.25 y = yval - 1.25
va = "bottom" if "Aider" in model else "top" va = "top"
color = "#eee" if "Aider" in model else "#555"
fontfamily = "Helvetica Bold" if "Aider" in model else "Helvetica"
ax.text( ax.text(
bar.get_x() + bar.get_width() / 2, bar.get_x() + bar.get_width() / 2,
y, y,
f"{yval}%", f"{yval}%",
ha="center", ha="center",
va=va, va=va,
fontsize=14, fontsize=16,
color=color,
fontfamily=fontfamily,
) )
# ax.set_xlabel("Models", fontsize=18) # ax.set_xlabel("Models", fontsize=18)
ax.set_ylabel("Instances resolved (%)", fontsize=18, color=font_color) ax.set_ylabel("Instances resolved (%)", fontsize=18, color=font_color)
ax.set_title("SWE Bench Lite", fontsize=20) ax.set_title("SWE Bench Lite", fontsize=20)
ax.set_ylim(0, 29.9) # ax.set_ylim(0, 29.9)
plt.xticks( plt.xticks(
fontsize=16, fontsize=16,
color=font_color, color=font_color,