mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-05 12:14:59 +00:00
copy
This commit is contained in:
parent
9d6a692054
commit
d7bb80468b
3 changed files with 153 additions and 3 deletions
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
title: r1 tops aider's polyglot leaderboard
|
||||
title: R1+Sonnet set SOTA on aider's polyglot benchmark
|
||||
#excerpt: o1 scores the top result on aider's new multi-language, more challenging coding benchmark.
|
||||
#highlight_image: /assets/o1-polyglot.jpg
|
||||
draft: false
|
||||
|
@ -9,12 +9,24 @@ nav_exclude: true
|
|||
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
|
||||
{% endif %}
|
||||
|
||||
# r1 tops aider's polyglot leaderboard
|
||||
# R1+Sonnet set SOTA on aider's polyglot benchmark
|
||||
{: .no_toc }
|
||||
|
||||
<canvas id="editChart" width="800" height="450" style="margin-top: 20px"></canvas>
|
||||
|
||||
Aider supports using a pair of models for coding:
|
||||
|
||||
- An Architect model is asked to describe how to solve the coding problem. Thinking/reasoning models often work well in this role.
|
||||
- An Editor model is given the Architect's solution and asked to produce specific code editing instructions to apply those changes to existing source files.
|
||||
|
||||
**R1 as architect with Sonnet as editor has set a new SOTA of 64.0%** on the
|
||||
[aider polyglot benchmark](/2024/12/21/polyglot.html).
|
||||
They achieve this at **14X less cost** compared to the previous o1 SOTA result.
|
||||
|
||||
Using o1 or R1 as architect with various other editor models didn't produce significantly
|
||||
better results than using them alone.
|
||||
This is in contrast to the first wave of thinking models like o1-preview and o1-mini,
|
||||
which improved when paired with many different editor models.
|
||||
|
||||
|
||||
## Results
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue