This commit is contained in:
Paul Gauthier 2025-01-24 08:22:13 -08:00
parent 9d6a692054
commit d7bb80468b
3 changed files with 153 additions and 3 deletions

View file

@ -1,5 +1,5 @@
---
title: r1 tops aider's polyglot leaderboard
title: R1+Sonnet set SOTA on aider's polyglot benchmark
#excerpt: o1 scores the top result on aider's new multi-language, more challenging coding benchmark.
#highlight_image: /assets/o1-polyglot.jpg
draft: false
@ -9,12 +9,24 @@ nav_exclude: true
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# r1 tops aider's polyglot leaderboard
# R1+Sonnet set SOTA on aider's polyglot benchmark
{: .no_toc }
<canvas id="editChart" width="800" height="450" style="margin-top: 20px"></canvas>
Aider supports using a pair of models for coding:
- An Architect model is asked to describe how to solve the coding problem. Thinking/reasoning models often work well in this role.
- An Editor model is given the Architect's solution and asked to produce specific code editing instructions to apply those changes to existing source files.
**R1 as architect with Sonnet as editor has set a new SOTA of 64.0%** on the
[aider polyglot benchmark](/2024/12/21/polyglot.html).
They achieve this at **14X less cost** compared to the previous o1 SOTA result.
Using o1 or R1 as architect with various other editor models didn't produce significantly
better results than using them alone.
This is in contrast to the first wave of thinking models like o1-preview and o1-mini,
which improved when paired with many different editor models.
## Results