copy

2025-06-05 12:14:59 +00:00 · 2025-01-24 08:22:13 -08:00 · 2025-01-24 08:22:13 -08:00 · d7bb80468b
commit d7bb80468b
parent 9d6a692054
3 changed files with 153 additions and 3 deletions
--- a/aider/website/_posts/2025-01-23-r1.md
+++ b/aider/website/_posts/2025-01-23-r1.md
@ -1,5 +1,5 @@
 ---
-title: r1 tops aider's polyglot leaderboard
+title: R1+Sonnet set SOTA on aider's polyglot benchmark
 #excerpt: o1 scores the top result on aider's new multi-language, more challenging coding benchmark.
 #highlight_image: /assets/o1-polyglot.jpg
 draft: false
@ -9,12 +9,24 @@ nav_exclude: true
 <p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
 {% endif %}

-# r1 tops aider's polyglot leaderboard
+# R1+Sonnet set SOTA on aider's polyglot benchmark
 {: .no_toc }

 <canvas id="editChart" width="800" height="450" style="margin-top: 20px"></canvas>

+Aider supports using a pair of models for coding:

+- An Architect model is asked to describe how to solve the coding problem. Thinking/reasoning models often work well in this role.
+- An Editor model is given the Architect's solution and asked to produce specific code editing instructions to apply those changes to existing source files.
+
+**R1 as architect with Sonnet as editor has set a new SOTA of 64.0%** on the 
+[aider polyglot benchmark](/2024/12/21/polyglot.html).
+They achieve this at **14X less cost** compared to the previous o1 SOTA result.
+
+Using o1 or R1 as architect with various other editor models didn't produce significantly
+better results than using them alone.
+This is in contrast to the first wave of thinking models like o1-preview and o1-mini,
+which improved when paired with many different editor models.


 ## Results