mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-30 17:24:59 +00:00
copy
This commit is contained in:
parent
d9594815b0
commit
15c228097b
4 changed files with 128 additions and 111 deletions
|
@ -13,7 +13,7 @@ on the
|
|||
achieving a state-of-the-art result.
|
||||
The current top leaderboard entry is 20.3%
|
||||
from Amazon Q Developer Agent.
|
||||
The best result reported elsewhere online seems to be
|
||||
The best result reported elsewhere seems to be
|
||||
[22.3% from AutoCodeRover](https://github.com/nus-apr/auto-code-rover).
|
||||
|
||||
[](https://aider.chat/assets/swe_bench_lite.svg)
|
||||
|
@ -94,26 +94,29 @@ that used aider with both GPT-4o & Opus.
|
|||
|
||||
The benchmark harness alternated between running aider with GPT-4o and Opus.
|
||||
The harness proceeded in a fixed order, always starting with GPT-4o and
|
||||
then alternating with Opus until a plausible solution was found.
|
||||
then alternating with Opus until a plausible solution was found for each
|
||||
problem.
|
||||
|
||||
The table below breaks down the 79 solutions that were ultimately
|
||||
verified as correctly resolving their issue.
|
||||
Some noteworthy observations:
|
||||
|
||||
- Aider with GPT-4o on the first attempt immediately found 69% of all plausible solutions which accounted for 77% of the correctly resulted problems.
|
||||
- Just the first attempt of Aider with GPT-4o resolved 20.3% of the problems, which ties the Amazon Q Developer Agent currently atop the official leaderboard.
|
||||
- Aider with GPT-4o on the first attempt immediately found 69% of all plausible solutions which accounted for 77% of the correctly resolved problems.
|
||||
- ~75% of all plausible and ~90% of all resolved solutions were found after one attempt from aider with GPT-4o and Opus.
|
||||
- A long tail of solutions continued to be found by both models including one resolved solution on the final, sixth attempt of that problem.
|
||||
- A long tail of solutions continued to be found by both models including one correctly resolved solution on the final, sixth attempt of that problem.
|
||||
|
||||
|
||||
| Attempt | Agent |Number<br>plausible<br>solutions|Percent of<br>plausible<br>solutions| Number<br/>correctly<br>resolved | Percent<br>of correctly<br>resolved |
|
||||
|:--------:|------------|---------:|---------:|----:|---:|
|
||||
| 1 | Aider with GPT-4o | 208 | 69.3% | 61 | 77.2% |
|
||||
| 2 | Aider with Opus | 49 | 16.3% | 10 | 12.7% |
|
||||
| 3 | Aider with GPT-4o | 20 | 6.7% | 3 | 3.8% |
|
||||
| 4 | Aider with Opus | 9 | 3.0% | 2 | 2.5% |
|
||||
| 5 | Aider with GPT-4o | 11 | 3.7% | 2 | 2.5% |
|
||||
| 6 | Aider with Opus | 3 | 1.0% | 1 | 1.3% |
|
||||
| **Total** | | **300** | **100%** | **79** | **100%** |
|
||||
| Attempt | Agent |Number<br>plausible<br>solutions|Percent of<br>plausible<br>solutions| Number<br/>correctly<br>resolved | Percent of<br>correctly<br>resolved | Percent of<br>SWE Bench Lite Resolved |
|
||||
|:--------:|------------|---------:|---------:|----:|---:|--:|
|
||||
| 1 | Aider with GPT-4o | 208 | 69.3% | 61 | 77.2% | 20.3% |
|
||||
| 2 | Aider with Opus | 49 | 16.3% | 10 | 12.7% | 3.3% |
|
||||
| 3 | Aider with GPT-4o | 20 | 6.7% | 3 | 3.8% | 1.0% |
|
||||
| 4 | Aider with Opus | 9 | 3.0% | 2 | 2.5% | 0.7% |
|
||||
| 5 | Aider with GPT-4o | 11 | 3.7% | 2 | 2.5% | 0.7% |
|
||||
| 6 | Aider with Opus | 3 | 1.0% | 1 | 1.3% | 0.3% |
|
||||
| **Total** | | **300** | **100%** | **79** | **100%** | **26.3%** |
|
||||
|
||||
|
||||
If we break down correct solutions purely by model,
|
||||
we can see that aider with GPT-4o outperforms Opus.
|
||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 37 KiB |
|
@ -6,7 +6,7 @@
|
|||
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
|
||||
<cc:Work>
|
||||
<dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
|
||||
<dc:date>2024-05-22T20:23:36.416838</dc:date>
|
||||
<dc:date>2024-05-23T07:38:15.931243</dc:date>
|
||||
<dc:format>image/svg+xml</dc:format>
|
||||
<dc:creator>
|
||||
<cc:Agent>
|
||||
|
@ -41,12 +41,12 @@ z
|
|||
<g id="xtick_1">
|
||||
<g id="line2d_1">
|
||||
<defs>
|
||||
<path id="m1c7d4f1d06" d="M 0 0
|
||||
<path id="m13d95e4709" d="M 0 0
|
||||
L 0 3.5
|
||||
" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</defs>
|
||||
<g>
|
||||
<use xlink:href="#m1c7d4f1d06" x="130.142981" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#m13d95e4709" x="130.142981" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_1">
|
||||
|
@ -453,7 +453,7 @@ z
|
|||
<g id="xtick_2">
|
||||
<g id="line2d_2">
|
||||
<g>
|
||||
<use xlink:href="#m1c7d4f1d06" x="213.207821" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#m13d95e4709" x="213.207821" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_2">
|
||||
|
@ -479,7 +479,7 @@ z
|
|||
<g id="xtick_3">
|
||||
<g id="line2d_3">
|
||||
<g>
|
||||
<use xlink:href="#m1c7d4f1d06" x="296.27266" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#m13d95e4709" x="296.27266" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_3">
|
||||
|
@ -601,7 +601,7 @@ z
|
|||
<g id="xtick_4">
|
||||
<g id="line2d_4">
|
||||
<g>
|
||||
<use xlink:href="#m1c7d4f1d06" x="379.3375" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#m13d95e4709" x="379.3375" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_4">
|
||||
|
@ -674,7 +674,7 @@ z
|
|||
<g id="xtick_5">
|
||||
<g id="line2d_5">
|
||||
<g>
|
||||
<use xlink:href="#m1c7d4f1d06" x="462.40234" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#m13d95e4709" x="462.40234" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_5">
|
||||
|
@ -886,7 +886,7 @@ z
|
|||
<g id="xtick_6">
|
||||
<g id="line2d_6">
|
||||
<g>
|
||||
<use xlink:href="#m1c7d4f1d06" x="545.467179" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#m13d95e4709" x="545.467179" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_6">
|
||||
|
@ -1007,7 +1007,7 @@ z
|
|||
<g id="xtick_7">
|
||||
<g id="line2d_7">
|
||||
<g>
|
||||
<use xlink:href="#m1c7d4f1d06" x="628.532019" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#m13d95e4709" x="628.532019" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_7">
|
||||
|
@ -1043,21 +1043,21 @@ z
|
|||
<g id="line2d_8">
|
||||
<path d="M 68.675 273.70025
|
||||
L 690 273.70025
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
</g>
|
||||
<g id="line2d_9">
|
||||
<defs>
|
||||
<path id="mb9d6d72965" d="M 0 0
|
||||
<path id="mb0b2eca59c" d="M 0 0
|
||||
L -3.5 0
|
||||
" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</defs>
|
||||
<g>
|
||||
<use xlink:href="#mb9d6d72965" x="68.675" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#mb0b2eca59c" x="68.675" y="273.70025" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_8">
|
||||
<!-- 0 -->
|
||||
<g transform="translate(56.114062 277.286969) scale(0.1 -0.1)">
|
||||
<g transform="translate(56.114063 277.286969) scale(0.1 -0.1)">
|
||||
<defs>
|
||||
<path id="Helvetica-30" d="M 1731 4475
|
||||
Q 2600 4475 2988 3759
|
||||
|
@ -1089,16 +1089,16 @@ z
|
|||
<g id="line2d_10">
|
||||
<path d="M 68.675 235.200207
|
||||
L 690 235.200207
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
</g>
|
||||
<g id="line2d_11">
|
||||
<g>
|
||||
<use xlink:href="#mb9d6d72965" x="68.675" y="235.200207" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#mb0b2eca59c" x="68.675" y="235.200207" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_9">
|
||||
<!-- 5 -->
|
||||
<g transform="translate(56.114062 238.786926) scale(0.1 -0.1)">
|
||||
<g transform="translate(56.114063 238.786926) scale(0.1 -0.1)">
|
||||
<defs>
|
||||
<path id="Helvetica-35" d="M 791 1141
|
||||
Q 847 659 1238 475
|
||||
|
@ -1135,11 +1135,11 @@ z
|
|||
<g id="line2d_12">
|
||||
<path d="M 68.675 196.700164
|
||||
L 690 196.700164
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
</g>
|
||||
<g id="line2d_13">
|
||||
<g>
|
||||
<use xlink:href="#mb9d6d72965" x="68.675" y="196.700164" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#mb0b2eca59c" x="68.675" y="196.700164" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_10">
|
||||
|
@ -1167,11 +1167,11 @@ z
|
|||
<g id="line2d_14">
|
||||
<path d="M 68.675 158.200121
|
||||
L 690 158.200121
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
</g>
|
||||
<g id="line2d_15">
|
||||
<g>
|
||||
<use xlink:href="#mb9d6d72965" x="68.675" y="158.200121" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#mb0b2eca59c" x="68.675" y="158.200121" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_11">
|
||||
|
@ -1186,11 +1186,11 @@ L 690 158.200121
|
|||
<g id="line2d_16">
|
||||
<path d="M 68.675 119.700078
|
||||
L 690 119.700078
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
</g>
|
||||
<g id="line2d_17">
|
||||
<g>
|
||||
<use xlink:href="#mb9d6d72965" x="68.675" y="119.700078" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#mb0b2eca59c" x="68.675" y="119.700078" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_12">
|
||||
|
@ -1232,11 +1232,11 @@ z
|
|||
<g id="line2d_18">
|
||||
<path d="M 68.675 81.200034
|
||||
L 690 81.200034
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: none; stroke: #b0b0b0; stroke-width: 0.2; stroke-linecap: square"/>
|
||||
</g>
|
||||
<g id="line2d_19">
|
||||
<g>
|
||||
<use xlink:href="#mb9d6d72965" x="68.675" y="81.200034" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
<use xlink:href="#mb0b2eca59c" x="68.675" y="81.200034" style="stroke: #000000; stroke-width: 0.8"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_13">
|
||||
|
@ -1248,9 +1248,40 @@ L 690 81.200034
|
|||
</g>
|
||||
</g>
|
||||
<g id="text_14">
|
||||
<!-- Pass rate (%) -->
|
||||
<g style="fill: #555555" transform="translate(42.80125 216.562) rotate(-90) scale(0.18 -0.18)">
|
||||
<!-- Instances resolved (%) -->
|
||||
<g style="fill: #555555" transform="translate(42.80125 253.582937) rotate(-90) scale(0.18 -0.18)">
|
||||
<defs>
|
||||
<path id="Helvetica-49" d="M 628 4591
|
||||
L 1256 4591
|
||||
L 1256 0
|
||||
L 628 0
|
||||
L 628 4591
|
||||
z
|
||||
" transform="scale(0.015625)"/>
|
||||
<path id="Helvetica-63" d="M 1703 3444
|
||||
Q 2269 3444 2623 3169
|
||||
Q 2978 2894 3050 2222
|
||||
L 2503 2222
|
||||
Q 2453 2531 2275 2736
|
||||
Q 2097 2941 1703 2941
|
||||
Q 1166 2941 934 2416
|
||||
Q 784 2075 784 1575
|
||||
Q 784 1072 996 728
|
||||
Q 1209 384 1666 384
|
||||
Q 2016 384 2220 598
|
||||
Q 2425 813 2503 1184
|
||||
L 3050 1184
|
||||
Q 2956 519 2581 211
|
||||
Q 2206 -97 1622 -97
|
||||
Q 966 -97 575 383
|
||||
Q 184 863 184 1581
|
||||
Q 184 2463 612 2953
|
||||
Q 1041 3444 1703 3444
|
||||
z
|
||||
M 1616 3428
|
||||
L 1616 3428
|
||||
z
|
||||
" transform="scale(0.015625)"/>
|
||||
<path id="Helvetica-28" d="M 1894 4666
|
||||
Q 1403 3713 1256 3263
|
||||
Q 1034 2578 1034 1681
|
||||
|
@ -1329,19 +1360,28 @@ L 222 -1306
|
|||
z
|
||||
" transform="scale(0.015625)"/>
|
||||
</defs>
|
||||
<use xlink:href="#Helvetica-50"/>
|
||||
<use xlink:href="#Helvetica-61" x="66.699219"/>
|
||||
<use xlink:href="#Helvetica-73" x="122.314453"/>
|
||||
<use xlink:href="#Helvetica-73" x="172.314453"/>
|
||||
<use xlink:href="#Helvetica-20" x="222.314453"/>
|
||||
<use xlink:href="#Helvetica-72" x="250.097656"/>
|
||||
<use xlink:href="#Helvetica-61" x="283.398438"/>
|
||||
<use xlink:href="#Helvetica-74" x="339.013672"/>
|
||||
<use xlink:href="#Helvetica-65" x="366.796875"/>
|
||||
<use xlink:href="#Helvetica-20" x="422.412109"/>
|
||||
<use xlink:href="#Helvetica-28" x="450.195312"/>
|
||||
<use xlink:href="#Helvetica-25" x="483.496094"/>
|
||||
<use xlink:href="#Helvetica-29" x="572.412109"/>
|
||||
<use xlink:href="#Helvetica-49"/>
|
||||
<use xlink:href="#Helvetica-6e" x="27.783203"/>
|
||||
<use xlink:href="#Helvetica-73" x="83.398438"/>
|
||||
<use xlink:href="#Helvetica-74" x="133.398438"/>
|
||||
<use xlink:href="#Helvetica-61" x="161.181641"/>
|
||||
<use xlink:href="#Helvetica-6e" x="216.796875"/>
|
||||
<use xlink:href="#Helvetica-63" x="272.412109"/>
|
||||
<use xlink:href="#Helvetica-65" x="322.412109"/>
|
||||
<use xlink:href="#Helvetica-73" x="378.027344"/>
|
||||
<use xlink:href="#Helvetica-20" x="428.027344"/>
|
||||
<use xlink:href="#Helvetica-72" x="455.810547"/>
|
||||
<use xlink:href="#Helvetica-65" x="489.111328"/>
|
||||
<use xlink:href="#Helvetica-73" x="544.726562"/>
|
||||
<use xlink:href="#Helvetica-6f" x="594.726562"/>
|
||||
<use xlink:href="#Helvetica-6c" x="650.341797"/>
|
||||
<use xlink:href="#Helvetica-76" x="672.558594"/>
|
||||
<use xlink:href="#Helvetica-65" x="722.558594"/>
|
||||
<use xlink:href="#Helvetica-64" x="778.173828"/>
|
||||
<use xlink:href="#Helvetica-20" x="833.789062"/>
|
||||
<use xlink:href="#Helvetica-28" x="861.572266"/>
|
||||
<use xlink:href="#Helvetica-25" x="894.873047"/>
|
||||
<use xlink:href="#Helvetica-29" x="983.789062"/>
|
||||
</g>
|
||||
</g>
|
||||
</g>
|
||||
|
@ -1368,10 +1408,10 @@ L 690 50.4
|
|||
<g id="patch_7">
|
||||
<path d="M 96.917045 273.70025
|
||||
L 163.368917 273.70025
|
||||
L 163.368917 70.420022
|
||||
L 96.917045 70.420022
|
||||
L 163.368917 71.190023
|
||||
L 96.917045 71.190023
|
||||
z
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: #b3e6a8; opacity: 0.75"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: #b3e6a8; opacity: 0.75"/>
|
||||
</g>
|
||||
<g id="patch_8">
|
||||
<path d="M 179.981885 273.70025
|
||||
|
@ -1379,7 +1419,7 @@ L 246.433757 273.70025
|
|||
L 246.433757 81.200034
|
||||
L 179.981885 81.200034
|
||||
z
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: #b3e6a8; opacity: 0.75"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: #b3e6a8; opacity: 0.75"/>
|
||||
</g>
|
||||
<g id="patch_9">
|
||||
<path d="M 263.046725 273.70025
|
||||
|
@ -1387,7 +1427,7 @@ L 329.498596 273.70025
|
|||
L 329.498596 101.990058
|
||||
L 263.046725 101.990058
|
||||
z
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
</g>
|
||||
<g id="patch_10">
|
||||
<path d="M 346.111564 273.70025
|
||||
|
@ -1395,7 +1435,7 @@ L 412.563436 273.70025
|
|||
L 412.563436 112.000069
|
||||
L 346.111564 112.000069
|
||||
z
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
</g>
|
||||
<g id="patch_11">
|
||||
<path d="M 429.176404 273.70025
|
||||
|
@ -1403,7 +1443,7 @@ L 495.628275 273.70025
|
|||
L 495.628275 117.390075
|
||||
L 429.176404 117.390075
|
||||
z
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
</g>
|
||||
<g id="patch_12">
|
||||
<path d="M 512.241243 273.70025
|
||||
|
@ -1411,7 +1451,7 @@ L 578.693115 273.70025
|
|||
L 578.693115 135.100095
|
||||
L 512.241243 135.100095
|
||||
z
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
</g>
|
||||
<g id="patch_13">
|
||||
<path d="M 595.306083 273.70025
|
||||
|
@ -1419,11 +1459,11 @@ L 661.757955 273.70025
|
|||
L 661.757955 183.610149
|
||||
L 595.306083 183.610149
|
||||
z
|
||||
" clip-path="url(#p4afbc1300d)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
" clip-path="url(#p535a156c8f)" style="fill: #b3d1e6; opacity: 0.75"/>
|
||||
</g>
|
||||
<g id="text_15">
|
||||
<!-- 26.4% -->
|
||||
<g style="fill: #555555" transform="translate(110.295794 92.012848) scale(0.14 -0.14)">
|
||||
<!-- 26.3% -->
|
||||
<g style="fill: #555555" transform="translate(110.295794 92.782849) scale(0.14 -0.14)">
|
||||
<defs>
|
||||
<path id="Helvetica-36" d="M 1872 4494
|
||||
Q 2622 4494 2917 4105
|
||||
|
@ -1462,28 +1502,6 @@ L 547 0
|
|||
L 547 681
|
||||
z
|
||||
" transform="scale(0.015625)"/>
|
||||
</defs>
|
||||
<use xlink:href="#Helvetica-32"/>
|
||||
<use xlink:href="#Helvetica-36" x="55.615234"/>
|
||||
<use xlink:href="#Helvetica-2e" x="111.230469"/>
|
||||
<use xlink:href="#Helvetica-34" x="139.013672"/>
|
||||
<use xlink:href="#Helvetica-25" x="194.628906"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_16">
|
||||
<!-- 25.0% -->
|
||||
<g style="fill: #555555" transform="translate(193.360633 102.79286) scale(0.14 -0.14)">
|
||||
<use xlink:href="#Helvetica-32"/>
|
||||
<use xlink:href="#Helvetica-35" x="55.615234"/>
|
||||
<use xlink:href="#Helvetica-2e" x="111.230469"/>
|
||||
<use xlink:href="#Helvetica-30" x="139.013672"/>
|
||||
<use xlink:href="#Helvetica-25" x="194.628906"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_17">
|
||||
<!-- 22.3% -->
|
||||
<g style="fill: #555555" transform="translate(276.425473 123.582883) scale(0.14 -0.14)">
|
||||
<defs>
|
||||
<path id="Helvetica-33" d="M 1663 -122
|
||||
Q 869 -122 511 314
|
||||
Q 153 750 153 1375
|
||||
|
@ -1519,6 +1537,26 @@ Q 2438 -122 1663 -122
|
|||
z
|
||||
" transform="scale(0.015625)"/>
|
||||
</defs>
|
||||
<use xlink:href="#Helvetica-32"/>
|
||||
<use xlink:href="#Helvetica-36" x="55.615234"/>
|
||||
<use xlink:href="#Helvetica-2e" x="111.230469"/>
|
||||
<use xlink:href="#Helvetica-33" x="139.013672"/>
|
||||
<use xlink:href="#Helvetica-25" x="194.628906"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_16">
|
||||
<!-- 25.0% -->
|
||||
<g style="fill: #555555" transform="translate(193.360633 102.79286) scale(0.14 -0.14)">
|
||||
<use xlink:href="#Helvetica-32"/>
|
||||
<use xlink:href="#Helvetica-35" x="55.615234"/>
|
||||
<use xlink:href="#Helvetica-2e" x="111.230469"/>
|
||||
<use xlink:href="#Helvetica-30" x="139.013672"/>
|
||||
<use xlink:href="#Helvetica-25" x="194.628906"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="text_17">
|
||||
<!-- 22.3% -->
|
||||
<g style="fill: #555555" transform="translate(276.425473 123.582883) scale(0.14 -0.14)">
|
||||
<use xlink:href="#Helvetica-32"/>
|
||||
<use xlink:href="#Helvetica-32" x="55.615234"/>
|
||||
<use xlink:href="#Helvetica-2e" x="111.230469"/>
|
||||
|
@ -1658,30 +1696,6 @@ Q 3319 0 2413 0
|
|||
L 472 0
|
||||
L 472 4591
|
||||
z
|
||||
" transform="scale(0.015625)"/>
|
||||
<path id="Helvetica-63" d="M 1703 3444
|
||||
Q 2269 3444 2623 3169
|
||||
Q 2978 2894 3050 2222
|
||||
L 2503 2222
|
||||
Q 2453 2531 2275 2736
|
||||
Q 2097 2941 1703 2941
|
||||
Q 1166 2941 934 2416
|
||||
Q 784 2075 784 1575
|
||||
Q 784 1072 996 728
|
||||
Q 1209 384 1666 384
|
||||
Q 2016 384 2220 598
|
||||
Q 2425 813 2503 1184
|
||||
L 3050 1184
|
||||
Q 2956 519 2581 211
|
||||
Q 2206 -97 1622 -97
|
||||
Q 966 -97 575 383
|
||||
Q 184 863 184 1581
|
||||
Q 184 2463 612 2953
|
||||
Q 1041 3444 1703 3444
|
||||
z
|
||||
M 1616 3428
|
||||
L 1616 3428
|
||||
z
|
||||
" transform="scale(0.015625)"/>
|
||||
<path id="Helvetica-68" d="M 413 4606
|
||||
L 975 4606
|
||||
|
@ -1731,7 +1745,7 @@ z
|
|||
</g>
|
||||
</g>
|
||||
<defs>
|
||||
<clipPath id="p4afbc1300d">
|
||||
<clipPath id="p535a156c8f">
|
||||
<rect x="68.675" y="50.4" width="621.325" height="223.30025"/>
|
||||
</clipPath>
|
||||
</defs>
|
||||
|
|
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 43 KiB |
|
@ -47,7 +47,7 @@ def plot_swe_bench_lite(data_file):
|
|||
)
|
||||
|
||||
# ax.set_xlabel("Models", fontsize=18)
|
||||
ax.set_ylabel("Pass rate (%)", fontsize=18, color=font_color)
|
||||
ax.set_ylabel("Instances resolved (%)", fontsize=18, color=font_color)
|
||||
ax.set_title("SWE Bench Lite", fontsize=20)
|
||||
ax.set_ylim(0, 29)
|
||||
plt.xticks(
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue