mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-13 08:05:01 +00:00
copy
This commit is contained in:
parent
d747a3781d
commit
eba845ea51
3 changed files with 152 additions and 20 deletions
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
title: Benchmark results for OpenAI o1-mini
|
||||
title: o1-preview is SOTA on the aider leaderboard
|
||||
excerpt: Preliminary benchmark results for the new OpenAI o1-mini model.
|
||||
nav_exclude: true
|
||||
---
|
||||
|
@ -7,7 +7,7 @@ nav_exclude: true
|
|||
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
|
||||
{% endif %}
|
||||
|
||||
# Benchmark results for OpenAI o1-mini
|
||||
# OpenAI o1-preview is SOTA on the aider leaderboard
|
||||
|
||||
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
|
||||
|
||||
|
@ -20,39 +20,58 @@ nav_exclude: true
|
|||
%}
|
||||
|
||||
|
||||
## o1-preview
|
||||
|
||||
OpenAI o1-preview scored 79.7% on aider's code editing benchmark,
|
||||
a state of the art result.
|
||||
It achieved this result with the
|
||||
["whole" edit format](/docs/leaderboards/#notes-on-the-edit-format),
|
||||
where the LLM returns a full copy of the source code file with changes.
|
||||
|
||||
It is much more practical to use aider's
|
||||
["diff" edit format](/docs/leaderboards/#notes-on-the-edit-format).
|
||||
which allows the LLM to return search/replace blocks to
|
||||
efficiently edit the source code.
|
||||
This saves significant time and token costs.
|
||||
|
||||
Using the diff edit format the o1-preview model had a strong
|
||||
benchmark score of 75.2%.
|
||||
This likely places o1-preview between Sonnet and GPT-4o for practical use,
|
||||
but at significantly higher cost.
|
||||
|
||||
## o1-mini
|
||||
|
||||
OpenAI o1-mini is priced similarly to GPT-4o and Claude 3.5 Sonnet,
|
||||
but scored below those models.
|
||||
It also works best with the whole edit format.
|
||||
|
||||
It works best with the
|
||||
["whole" edit format](/docs/leaderboards/#notes-on-the-edit-format),
|
||||
where it returns a full copy of the source code file with changes.
|
||||
Other frontier models like GPT-4o and Sonnet are able to achieve
|
||||
high benchmark scores using the
|
||||
["diff" edit format](/docs/leaderboards/#notes-on-the-edit-format),
|
||||
This allows them to return search/replace blocks to
|
||||
efficiently edit the source code, saving time and token costs.
|
||||
|
||||
## Future work
|
||||
|
||||
The o1-preview model had trouble conforming to aider's diff edit format.
|
||||
The o1-mini model had trouble conforming to both the whole and diff edit formats.
|
||||
Aider is extremely permissive and tries hard to accept anything close
|
||||
to the correct formats.
|
||||
|
||||
It's possible that o1-mini would get better scores if aider prompted with
|
||||
more examples or was adapted to parse o1-mini's favorite ways to mangle
|
||||
the response formats.
|
||||
Over time it may be possible to better harness o1-mini's capabilities through
|
||||
different prompting and editing formats.
|
||||
It is surprising that such strong models had trouble with
|
||||
the syntactic requirements of simple text output formats.
|
||||
It seems likely that aider could optimize its prompts and edit formats to
|
||||
better harness the o1 models.
|
||||
|
||||
## Using aider with o1-mini and o1-preview
|
||||
|
||||
## Using aider with o1
|
||||
|
||||
OpenAI's new o1 models are supported in the development version of aider:
|
||||
|
||||
```
|
||||
# To upgrade to the development version:
|
||||
aider --install-main-branch
|
||||
# or...
|
||||
|
||||
# Or, to upgrade/install:
|
||||
python -m pip install --upgrade git+https://github.com/paul-gauthier/aider.git
|
||||
|
||||
# To launch aider with an o1 model:
|
||||
aider --model o1-mini
|
||||
|
||||
aider --model o1-preview
|
||||
```
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue