diff --git a/aider/website/_posts/2024-08-26-sonnet-seems-fine.md b/aider/website/_posts/2024-08-26-sonnet-seems-fine.md index 1fd8d8610..6bf1eb57e 100644 --- a/aider/website/_posts/2024-08-26-sonnet-seems-fine.md +++ b/aider/website/_posts/2024-08-26-sonnet-seems-fine.md @@ -11,12 +11,9 @@ highlight_image: /assets/sonnet-seems-fine.jpg Recently there has been a lot of speculation that Sonnet has been dumbed-down, nerfed or is otherwise performing worse. - -Sonnet seems as good as ever, at least when accessed via -the API. -As part of developing aider, I benchmark the top LLMs regularly. -I have not noticed -any degradation in Claude 3.5 Sonnet's performance at code editing. +Sonnet seems as good as ever, when performing the +[aider code editing benchmark](/docs/benchmarks.html#the-benchmark) +via the API. Below is a graph showing the performance of Claude 3.5 Sonnet over time. It shows every clean, comparable benchmark run performed since Sonnet launched. @@ -28,6 +25,9 @@ degradation. There is always some variance in benchmark results, typically +/- 2% between runs with identical prompts. +It's worth noting that these results would not capture any changes +made to how Sonnet is presented in Anthropic's web chat UI. +