copy

2025-05-31 09:44:59 +00:00 · 2024-01-26 13:50:34 -08:00 · 2024-01-26 13:50:34 -08:00 · 885e33c2b3
commit 885e33c2b3
parent edcf9b146b
1 changed files with 3 additions and 5 deletions
--- a/docs/benchmarks-0125.md
+++ b/docs/benchmarks-0125.md
@ -13,13 +13,11 @@ aider's existing

 ## Benchmark results

-**These results are currently preliminary, and will be updated as additional benchmark runs complete.**
-
 Overall,
-the new `gpt-4-0125-preview` model does worse on the lazy coding benchmark
-as compared to the November `gpt-4-1106-preview` model:
+the new `gpt-4-0125-preview` model seems lazier
+than the November `gpt-4-1106-preview` model:

- It performs much worse when using the [unified diffs](https://aider.chat/docs/unified-diffs.html) code editing format.
+- It gets worse benchmark scores when using the [unified diffs](https://aider.chat/docs/unified-diffs.html) code editing format.
 - Using aider's older [SEARCH/REPLACE block](https://github.com/paul-gauthier/aider/blob/9033be74bf74ae70459013e54b2ae6a97c47c2e6/aider/coders/editblock_prompts.py#L75-L80) editing format, the new January model outperforms the older November model. But it still performs worse than both models using unified diffs.

 ## Related reports