This commit is contained in:
Paul Gauthier 2024-05-23 07:45:54 -07:00
parent d9594815b0
commit 15c228097b
4 changed files with 128 additions and 111 deletions

View file

@ -13,7 +13,7 @@ on the
achieving a state-of-the-art result. achieving a state-of-the-art result.
The current top leaderboard entry is 20.3% The current top leaderboard entry is 20.3%
from Amazon Q Developer Agent. from Amazon Q Developer Agent.
The best result reported elsewhere online seems to be The best result reported elsewhere seems to be
[22.3% from AutoCodeRover](https://github.com/nus-apr/auto-code-rover). [22.3% from AutoCodeRover](https://github.com/nus-apr/auto-code-rover).
[![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg) [![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg)
@ -94,26 +94,29 @@ that used aider with both GPT-4o & Opus.
The benchmark harness alternated between running aider with GPT-4o and Opus. The benchmark harness alternated between running aider with GPT-4o and Opus.
The harness proceeded in a fixed order, always starting with GPT-4o and The harness proceeded in a fixed order, always starting with GPT-4o and
then alternating with Opus until a plausible solution was found. then alternating with Opus until a plausible solution was found for each
problem.
The table below breaks down the 79 solutions that were ultimately The table below breaks down the 79 solutions that were ultimately
verified as correctly resolving their issue. verified as correctly resolving their issue.
Some noteworthy observations: Some noteworthy observations:
- Aider with GPT-4o on the first attempt immediately found 69% of all plausible solutions which accounted for 77% of the correctly resulted problems. - Just the first attempt of Aider with GPT-4o resolved 20.3% of the problems, which ties the Amazon Q Developer Agent currently atop the official leaderboard.
- Aider with GPT-4o on the first attempt immediately found 69% of all plausible solutions which accounted for 77% of the correctly resolved problems.
- ~75% of all plausible and ~90% of all resolved solutions were found after one attempt from aider with GPT-4o and Opus. - ~75% of all plausible and ~90% of all resolved solutions were found after one attempt from aider with GPT-4o and Opus.
- A long tail of solutions continued to be found by both models including one resolved solution on the final, sixth attempt of that problem. - A long tail of solutions continued to be found by both models including one correctly resolved solution on the final, sixth attempt of that problem.
| Attempt | Agent |Number<br>plausible<br>solutions|Percent of<br>plausible<br>solutions| Number<br/>correctly<br>resolved | Percent<br>of correctly<br>resolved | | Attempt | Agent |Number<br>plausible<br>solutions|Percent&nbsp;of<br>plausible<br>solutions| Number<br/>correctly<br>resolved | Percent&nbsp;of<br>correctly<br>resolved | Percent of<br>SWE Bench Lite&nbsp;Resolved |
|:--------:|------------|---------:|---------:|----:|---:| |:--------:|------------|---------:|---------:|----:|---:|--:|
| 1 | Aider with GPT-4o | 208 | 69.3% | 61 | 77.2% | | 1 | Aider with GPT-4o | 208 | 69.3% | 61 | 77.2% | 20.3% |
| 2 | Aider with Opus | 49 | 16.3% | 10 | 12.7% | | 2 | Aider with Opus | 49 | 16.3% | 10 | 12.7% | 3.3% |
| 3 | Aider with GPT-4o | 20 | 6.7% | 3 | 3.8% | | 3 | Aider with GPT-4o | 20 | 6.7% | 3 | 3.8% | 1.0% |
| 4 | Aider with Opus | 9 | 3.0% | 2 | 2.5% | | 4 | Aider with Opus | 9 | 3.0% | 2 | 2.5% | 0.7% |
| 5 | Aider with GPT-4o | 11 | 3.7% | 2 | 2.5% | | 5 | Aider with GPT-4o | 11 | 3.7% | 2 | 2.5% | 0.7% |
| 6 | Aider with Opus | 3 | 1.0% | 1 | 1.3% | | 6 | Aider with Opus | 3 | 1.0% | 1 | 1.3% | 0.3% |
| **Total** | | **300** | **100%** | **79** | **100%** | | **Total** | | **300** | **100%** | **79** | **100%** | **26.3%** |
If we break down correct solutions purely by model, If we break down correct solutions purely by model,
we can see that aider with GPT-4o outperforms Opus. we can see that aider with GPT-4o outperforms Opus.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

After

Width:  |  Height:  |  Size: 37 KiB

Before After
Before After

View file

@ -6,7 +6,7 @@
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cc:Work> <cc:Work>
<dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
<dc:date>2024-05-22T20:23:36.416838</dc:date> <dc:date>2024-05-23T07:38:15.931243</dc:date>
<dc:format>image/svg+xml</dc:format> <dc:format>image/svg+xml</dc:format>
<dc:creator> <dc:creator>
<cc:Agent> <cc:Agent>
@ -41,12 +41,12 @@ z
<g id="xtick_1"> <g id="xtick_1">
<g id="line2d_1"> <g id="line2d_1">
<defs> <defs>
<path id="m1c7d4f1d06" d="M 0 0 <path id="m13d95e4709" d="M 0 0
L 0 3.5 L 0 3.5
" style="stroke: #000000; stroke-width: 0.8"/> " style="stroke: #000000; stroke-width: 0.8"/>
</defs> </defs>
<g> <g>
<use xlink:href="#m1c7d4f1d06" x="130.142981" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m13d95e4709" x="130.142981" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_1"> <g id="text_1">
@ -453,7 +453,7 @@ z
<g id="xtick_2"> <g id="xtick_2">
<g id="line2d_2"> <g id="line2d_2">
<g> <g>
<use xlink:href="#m1c7d4f1d06" x="213.207821" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m13d95e4709" x="213.207821" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_2"> <g id="text_2">
@ -479,7 +479,7 @@ z
<g id="xtick_3"> <g id="xtick_3">
<g id="line2d_3"> <g id="line2d_3">
<g> <g>
<use xlink:href="#m1c7d4f1d06" x="296.27266" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m13d95e4709" x="296.27266" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_3"> <g id="text_3">
@ -601,7 +601,7 @@ z
<g id="xtick_4"> <g id="xtick_4">
<g id="line2d_4"> <g id="line2d_4">
<g> <g>
<use xlink:href="#m1c7d4f1d06" x="379.3375" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m13d95e4709" x="379.3375" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_4"> <g id="text_4">
@ -674,7 +674,7 @@ z
<g id="xtick_5"> <g id="xtick_5">
<g id="line2d_5"> <g id="line2d_5">
<g> <g>
<use xlink:href="#m1c7d4f1d06" x="462.40234" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m13d95e4709" x="462.40234" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_5"> <g id="text_5">
@ -886,7 +886,7 @@ z
<g id="xtick_6"> <g id="xtick_6">
<g id="line2d_6"> <g id="line2d_6">
<g> <g>
<use xlink:href="#m1c7d4f1d06" x="545.467179" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m13d95e4709" x="545.467179" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_6"> <g id="text_6">
@ -1007,7 +1007,7 @@ z
<g id="xtick_7"> <g id="xtick_7">
<g id="line2d_7"> <g id="line2d_7">
<g> <g>
<use xlink:href="#m1c7d4f1d06" x="628.532019" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#m13d95e4709" x="628.532019" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_7"> <g id="text_7">
@ -1043,21 +1043,21 @@ z
<g id="line2d_8"> <g id="line2d_8">
<path d="M 68.675 273.70025 <path d="M 68.675 273.70025
L 690 273.70025 L 690 273.70025
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_9"> <g id="line2d_9">
<defs> <defs>
<path id="mb9d6d72965" d="M 0 0 <path id="mb0b2eca59c" d="M 0 0
L -3.5 0 L -3.5 0
" style="stroke: #000000; stroke-width: 0.8"/> " style="stroke: #000000; stroke-width: 0.8"/>
</defs> </defs>
<g> <g>
<use xlink:href="#mb9d6d72965" x="68.675" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#mb0b2eca59c" x="68.675" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_8"> <g id="text_8">
<!-- 0 --> <!-- 0 -->
<g transform="translate(56.114062 277.286969) scale(0.1 -0.1)"> <g transform="translate(56.114063 277.286969) scale(0.1 -0.1)">
<defs> <defs>
<path id="Helvetica-30" d="M 1731 4475 <path id="Helvetica-30" d="M 1731 4475
Q 2600 4475 2988 3759 Q 2600 4475 2988 3759
@ -1089,16 +1089,16 @@ z
<g id="line2d_10"> <g id="line2d_10">
<path d="M 68.675 235.200207 <path d="M 68.675 235.200207
L 690 235.200207 L 690 235.200207
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_11"> <g id="line2d_11">
<g> <g>
<use xlink:href="#mb9d6d72965" x="68.675" y="235.200207" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#mb0b2eca59c" x="68.675" y="235.200207" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_9"> <g id="text_9">
<!-- 5 --> <!-- 5 -->
<g transform="translate(56.114062 238.786926) scale(0.1 -0.1)"> <g transform="translate(56.114063 238.786926) scale(0.1 -0.1)">
<defs> <defs>
<path id="Helvetica-35" d="M 791 1141 <path id="Helvetica-35" d="M 791 1141
Q 847 659 1238 475 Q 847 659 1238 475
@ -1135,11 +1135,11 @@ z
<g id="line2d_12"> <g id="line2d_12">
<path d="M 68.675 196.700164 <path d="M 68.675 196.700164
L 690 196.700164 L 690 196.700164
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_13"> <g id="line2d_13">
<g> <g>
<use xlink:href="#mb9d6d72965" x="68.675" y="196.700164" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#mb0b2eca59c" x="68.675" y="196.700164" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_10"> <g id="text_10">
@ -1167,11 +1167,11 @@ z
<g id="line2d_14"> <g id="line2d_14">
<path d="M 68.675 158.200121 <path d="M 68.675 158.200121
L 690 158.200121 L 690 158.200121
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_15"> <g id="line2d_15">
<g> <g>
<use xlink:href="#mb9d6d72965" x="68.675" y="158.200121" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#mb0b2eca59c" x="68.675" y="158.200121" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_11"> <g id="text_11">
@ -1186,11 +1186,11 @@ L 690 158.200121
<g id="line2d_16"> <g id="line2d_16">
<path d="M 68.675 119.700078 <path d="M 68.675 119.700078
L 690 119.700078 L 690 119.700078
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_17"> <g id="line2d_17">
<g> <g>
<use xlink:href="#mb9d6d72965" x="68.675" y="119.700078" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#mb0b2eca59c" x="68.675" y="119.700078" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_12"> <g id="text_12">
@ -1232,11 +1232,11 @@ z
<g id="line2d_18"> <g id="line2d_18">
<path d="M 68.675 81.200034 <path d="M 68.675 81.200034
L 690 81.200034 L 690 81.200034
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/> " clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
</g> </g>
<g id="line2d_19"> <g id="line2d_19">
<g> <g>
<use xlink:href="#mb9d6d72965" x="68.675" y="81.200034" style="stroke: #000000; stroke-width: 0.8"/> <use xlink:href="#mb0b2eca59c" x="68.675" y="81.200034" style="stroke: #000000; stroke-width: 0.8"/>
</g> </g>
</g> </g>
<g id="text_13"> <g id="text_13">
@ -1248,9 +1248,40 @@ L 690 81.200034
</g> </g>
</g> </g>
<g id="text_14"> <g id="text_14">
<!-- Pass rate (%) --> <!-- Instances resolved (%) -->
<g style="fill: #555555" transform="translate(42.80125 216.562) rotate(-90) scale(0.18 -0.18)"> <g style="fill: #555555" transform="translate(42.80125 253.582937) rotate(-90) scale(0.18 -0.18)">
<defs> <defs>
<path id="Helvetica-49" d="M 628 4591
L 1256 4591
L 1256 0
L 628 0
L 628 4591
z
" transform="scale(0.015625)"/>
<path id="Helvetica-63" d="M 1703 3444
Q 2269 3444 2623 3169
Q 2978 2894 3050 2222
L 2503 2222
Q 2453 2531 2275 2736
Q 2097 2941 1703 2941
Q 1166 2941 934 2416
Q 784 2075 784 1575
Q 784 1072 996 728
Q 1209 384 1666 384
Q 2016 384 2220 598
Q 2425 813 2503 1184
L 3050 1184
Q 2956 519 2581 211
Q 2206 -97 1622 -97
Q 966 -97 575 383
Q 184 863 184 1581
Q 184 2463 612 2953
Q 1041 3444 1703 3444
z
M 1616 3428
L 1616 3428
z
" transform="scale(0.015625)"/>
<path id="Helvetica-28" d="M 1894 4666 <path id="Helvetica-28" d="M 1894 4666
Q 1403 3713 1256 3263 Q 1403 3713 1256 3263
Q 1034 2578 1034 1681 Q 1034 2578 1034 1681
@ -1329,19 +1360,28 @@ L 222 -1306
z z
" transform="scale(0.015625)"/> " transform="scale(0.015625)"/>
</defs> </defs>
<use xlink:href="#Helvetica-50"/> <use xlink:href="#Helvetica-49"/>
<use xlink:href="#Helvetica-61" x="66.699219"/> <use xlink:href="#Helvetica-6e" x="27.783203"/>
<use xlink:href="#Helvetica-73" x="122.314453"/> <use xlink:href="#Helvetica-73" x="83.398438"/>
<use xlink:href="#Helvetica-73" x="172.314453"/> <use xlink:href="#Helvetica-74" x="133.398438"/>
<use xlink:href="#Helvetica-20" x="222.314453"/> <use xlink:href="#Helvetica-61" x="161.181641"/>
<use xlink:href="#Helvetica-72" x="250.097656"/> <use xlink:href="#Helvetica-6e" x="216.796875"/>
<use xlink:href="#Helvetica-61" x="283.398438"/> <use xlink:href="#Helvetica-63" x="272.412109"/>
<use xlink:href="#Helvetica-74" x="339.013672"/> <use xlink:href="#Helvetica-65" x="322.412109"/>
<use xlink:href="#Helvetica-65" x="366.796875"/> <use xlink:href="#Helvetica-73" x="378.027344"/>
<use xlink:href="#Helvetica-20" x="422.412109"/> <use xlink:href="#Helvetica-20" x="428.027344"/>
<use xlink:href="#Helvetica-28" x="450.195312"/> <use xlink:href="#Helvetica-72" x="455.810547"/>
<use xlink:href="#Helvetica-25" x="483.496094"/> <use xlink:href="#Helvetica-65" x="489.111328"/>
<use xlink:href="#Helvetica-29" x="572.412109"/> <use xlink:href="#Helvetica-73" x="544.726562"/>
<use xlink:href="#Helvetica-6f" x="594.726562"/>
<use xlink:href="#Helvetica-6c" x="650.341797"/>
<use xlink:href="#Helvetica-76" x="672.558594"/>
<use xlink:href="#Helvetica-65" x="722.558594"/>
<use xlink:href="#Helvetica-64" x="778.173828"/>
<use xlink:href="#Helvetica-20" x="833.789062"/>
<use xlink:href="#Helvetica-28" x="861.572266"/>
<use xlink:href="#Helvetica-25" x="894.873047"/>
<use xlink:href="#Helvetica-29" x="983.789062"/>
</g> </g>
</g> </g>
</g> </g>
@ -1368,10 +1408,10 @@ L 690 50.4
<g id="patch_7"> <g id="patch_7">
<path d="M 96.917045 273.70025 <path d="M 96.917045 273.70025
L 163.368917 273.70025 L 163.368917 273.70025
L 163.368917 70.420022 L 163.368917 71.190023
L 96.917045 70.420022 L 96.917045 71.190023
z z
" clip-path="url(#p4afbc1300d)" style="fill: #b3e6a8; opacity: 0.75"/> " clip-path="url(#p535a156c8f)" style="fill: #b3e6a8; opacity: 0.75"/>
</g> </g>
<g id="patch_8"> <g id="patch_8">
<path d="M 179.981885 273.70025 <path d="M 179.981885 273.70025
@ -1379,7 +1419,7 @@ L 246.433757 273.70025
L 246.433757 81.200034 L 246.433757 81.200034
L 179.981885 81.200034 L 179.981885 81.200034
z z
" clip-path="url(#p4afbc1300d)" style="fill: #b3e6a8; opacity: 0.75"/> " clip-path="url(#p535a156c8f)" style="fill: #b3e6a8; opacity: 0.75"/>
</g> </g>
<g id="patch_9"> <g id="patch_9">
<path d="M 263.046725 273.70025 <path d="M 263.046725 273.70025
@ -1387,7 +1427,7 @@ L 329.498596 273.70025
L 329.498596 101.990058 L 329.498596 101.990058
L 263.046725 101.990058 L 263.046725 101.990058
z z
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/> " clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
</g> </g>
<g id="patch_10"> <g id="patch_10">
<path d="M 346.111564 273.70025 <path d="M 346.111564 273.70025
@ -1395,7 +1435,7 @@ L 412.563436 273.70025
L 412.563436 112.000069 L 412.563436 112.000069
L 346.111564 112.000069 L 346.111564 112.000069
z z
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/> " clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
</g> </g>
<g id="patch_11"> <g id="patch_11">
<path d="M 429.176404 273.70025 <path d="M 429.176404 273.70025
@ -1403,7 +1443,7 @@ L 495.628275 273.70025
L 495.628275 117.390075 L 495.628275 117.390075
L 429.176404 117.390075 L 429.176404 117.390075
z z
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/> " clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
</g> </g>
<g id="patch_12"> <g id="patch_12">
<path d="M 512.241243 273.70025 <path d="M 512.241243 273.70025
@ -1411,7 +1451,7 @@ L 578.693115 273.70025
L 578.693115 135.100095 L 578.693115 135.100095
L 512.241243 135.100095 L 512.241243 135.100095
z z
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/> " clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
</g> </g>
<g id="patch_13"> <g id="patch_13">
<path d="M 595.306083 273.70025 <path d="M 595.306083 273.70025
@ -1419,11 +1459,11 @@ L 661.757955 273.70025
L 661.757955 183.610149 L 661.757955 183.610149
L 595.306083 183.610149 L 595.306083 183.610149
z z
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/> " clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
</g> </g>
<g id="text_15"> <g id="text_15">
<!-- 26.4% --> <!-- 26.3% -->
<g style="fill: #555555" transform="translate(110.295794 92.012848) scale(0.14 -0.14)"> <g style="fill: #555555" transform="translate(110.295794 92.782849) scale(0.14 -0.14)">
<defs> <defs>
<path id="Helvetica-36" d="M 1872 4494 <path id="Helvetica-36" d="M 1872 4494
Q 2622 4494 2917 4105 Q 2622 4494 2917 4105
@ -1462,28 +1502,6 @@ L 547 0
L 547 681 L 547 681
z z
" transform="scale(0.015625)"/> " transform="scale(0.015625)"/>
</defs>
<use xlink:href="#Helvetica-32"/>
<use xlink:href="#Helvetica-36" x="55.615234"/>
<use xlink:href="#Helvetica-2e" x="111.230469"/>
<use xlink:href="#Helvetica-34" x="139.013672"/>
<use xlink:href="#Helvetica-25" x="194.628906"/>
</g>
</g>
<g id="text_16">
<!-- 25.0% -->
<g style="fill: #555555" transform="translate(193.360633 102.79286) scale(0.14 -0.14)">
<use xlink:href="#Helvetica-32"/>
<use xlink:href="#Helvetica-35" x="55.615234"/>
<use xlink:href="#Helvetica-2e" x="111.230469"/>
<use xlink:href="#Helvetica-30" x="139.013672"/>
<use xlink:href="#Helvetica-25" x="194.628906"/>
</g>
</g>
<g id="text_17">
<!-- 22.3% -->
<g style="fill: #555555" transform="translate(276.425473 123.582883) scale(0.14 -0.14)">
<defs>
<path id="Helvetica-33" d="M 1663 -122 <path id="Helvetica-33" d="M 1663 -122
Q 869 -122 511 314 Q 869 -122 511 314
Q 153 750 153 1375 Q 153 750 153 1375
@ -1519,6 +1537,26 @@ Q 2438 -122 1663 -122
z z
" transform="scale(0.015625)"/> " transform="scale(0.015625)"/>
</defs> </defs>
<use xlink:href="#Helvetica-32"/>
<use xlink:href="#Helvetica-36" x="55.615234"/>
<use xlink:href="#Helvetica-2e" x="111.230469"/>
<use xlink:href="#Helvetica-33" x="139.013672"/>
<use xlink:href="#Helvetica-25" x="194.628906"/>
</g>
</g>
<g id="text_16">
<!-- 25.0% -->
<g style="fill: #555555" transform="translate(193.360633 102.79286) scale(0.14 -0.14)">
<use xlink:href="#Helvetica-32"/>
<use xlink:href="#Helvetica-35" x="55.615234"/>
<use xlink:href="#Helvetica-2e" x="111.230469"/>
<use xlink:href="#Helvetica-30" x="139.013672"/>
<use xlink:href="#Helvetica-25" x="194.628906"/>
</g>
</g>
<g id="text_17">
<!-- 22.3% -->
<g style="fill: #555555" transform="translate(276.425473 123.582883) scale(0.14 -0.14)">
<use xlink:href="#Helvetica-32"/> <use xlink:href="#Helvetica-32"/>
<use xlink:href="#Helvetica-32" x="55.615234"/> <use xlink:href="#Helvetica-32" x="55.615234"/>
<use xlink:href="#Helvetica-2e" x="111.230469"/> <use xlink:href="#Helvetica-2e" x="111.230469"/>
@ -1658,30 +1696,6 @@ Q 3319 0 2413 0
L 472 0 L 472 0
L 472 4591 L 472 4591
z z
" transform="scale(0.015625)"/>
<path id="Helvetica-63" d="M 1703 3444
Q 2269 3444 2623 3169
Q 2978 2894 3050 2222
L 2503 2222
Q 2453 2531 2275 2736
Q 2097 2941 1703 2941
Q 1166 2941 934 2416
Q 784 2075 784 1575
Q 784 1072 996 728
Q 1209 384 1666 384
Q 2016 384 2220 598
Q 2425 813 2503 1184
L 3050 1184
Q 2956 519 2581 211
Q 2206 -97 1622 -97
Q 966 -97 575 383
Q 184 863 184 1581
Q 184 2463 612 2953
Q 1041 3444 1703 3444
z
M 1616 3428
L 1616 3428
z
" transform="scale(0.015625)"/> " transform="scale(0.015625)"/>
<path id="Helvetica-68" d="M 413 4606 <path id="Helvetica-68" d="M 413 4606
L 975 4606 L 975 4606
@ -1731,7 +1745,7 @@ z
</g> </g>
</g> </g>
<defs> <defs>
<clipPath id="p4afbc1300d"> <clipPath id="p535a156c8f">
<rect x="68.675" y="50.4" width="621.325" height="223.30025"/> <rect x="68.675" y="50.4" width="621.325" height="223.30025"/>
</clipPath> </clipPath>
</defs> </defs>

Before

Width:  |  Height:  |  Size: 42 KiB

After

Width:  |  Height:  |  Size: 43 KiB

Before After
Before After

View file

@ -47,7 +47,7 @@ def plot_swe_bench_lite(data_file):
) )
# ax.set_xlabel("Models", fontsize=18) # ax.set_xlabel("Models", fontsize=18)
ax.set_ylabel("Pass rate (%)", fontsize=18, color=font_color) ax.set_ylabel("Instances resolved (%)", fontsize=18, color=font_color)
ax.set_title("SWE Bench Lite", fontsize=20) ax.set_title("SWE Bench Lite", fontsize=20)
ax.set_ylim(0, 29) ax.set_ylim(0, 29)
plt.xticks( plt.xticks(