o1-mini diff results

2025-06-10 14:45:00 +00:00 · 2024-09-12 15:38:40 -07:00 · 2024-09-12 15:38:40 -07:00 · c00ac80909
commit c00ac80909
parent 1fbb5079d5
3 changed files with 68 additions and 17 deletions
--- a/aider/website/_posts/2024-09-12-o1.md
+++ b/aider/website/_posts/2024-09-12-o1.md
@ -10,23 +10,26 @@ nav_exclude: true
 # Benchmark results for OpenAI o1-mini

 OpenAI o1-mini is priced similarly to GPT-4o and Claude 3.5 Sonnet,
-but scored below those models
-when using the "whole" editing format.
-It was close enough to GPT-4o to be within the margin of error.
+but scored below those models.

-The o1-mini model had trouble following the very simple whole editing format.
-It's possible that it would get a better score if aider prompted with
-more examples or was adapted to parse o1-mini's favorite way to mangle
-the response format.
+It works best with the 
+["whole" edit format](/docs/leaderboards/#notes-on-the-edit-format),
+where it returns a full copy of the source code file with changes.
+Other frontier models like GPT-4o and Sonnet are able to achieve
+high benchmark scores using the 
+["diff" edit format](/docs/leaderboards/#notes-on-the-edit-format),
+This allows them to return search/replace blocks to 
+efficiently edit the source code, saving time and token costs.

-Note that o1-mini's "whole" score is compared against GPT-4o and Sonnet 
-"diff" results.
-Using diff is more challenging,
-but allows the model to return search/replace blocks to 
-efficiently edit the source code.
-The whole format requires the o1-mini to return a fresh copy of the entire file,
-increasing costs and latency.
+The o1-mini model had trouble conforming to both the whole and diff edit formats.
+Aider is extremely permissive and tries hard to accept anything close
+to the correct formats.
+It's possible that o1-mini would get better scores if aider prompted with
+more examples or was adapted to parse o1-mini's favorite ways to mangle
+the response formats.

+Over time it may be possible to better harness o1-mini's capabilities through
+different prompting and editing formats.

 ## Using aider with o1-mini and o1-preview