This commit is contained in:
Paul Gauthier 2024-01-26 13:50:34 -08:00
parent edcf9b146b
commit 885e33c2b3

View file

@ -13,13 +13,11 @@ aider's existing
## Benchmark results
**These results are currently preliminary, and will be updated as additional benchmark runs complete.**
Overall,
the new `gpt-4-0125-preview` model does worse on the lazy coding benchmark
as compared to the November `gpt-4-1106-preview` model:
the new `gpt-4-0125-preview` model seems lazier
than the November `gpt-4-1106-preview` model:
- It performs much worse when using the [unified diffs](https://aider.chat/docs/unified-diffs.html) code editing format.
- It gets worse benchmark scores when using the [unified diffs](https://aider.chat/docs/unified-diffs.html) code editing format.
- Using aider's older [SEARCH/REPLACE block](https://github.com/paul-gauthier/aider/blob/9033be74bf74ae70459013e54b2ae6a97c47c2e6/aider/coders/editblock_prompts.py#L75-L80) editing format, the new January model outperforms the older November model. But it still performs worse than both models using unified diffs.
## Related reports