mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-30 09:14:59 +00:00
copy
This commit is contained in:
parent
4b82277ef7
commit
008b1cb5f7
1 changed files with 11 additions and 11 deletions
|
@ -11,12 +11,9 @@ highlight_image: /assets/sonnet-seems-fine.jpg
|
|||
|
||||
Recently there has been a lot of speculation that Sonnet has been
|
||||
dumbed-down, nerfed or is otherwise performing worse.
|
||||
|
||||
Sonnet seems as good as ever, at least when accessed via
|
||||
the API.
|
||||
As part of developing aider, I benchmark the top LLMs regularly.
|
||||
I have not noticed
|
||||
any degradation in Claude 3.5 Sonnet's performance at code editing.
|
||||
Sonnet seems as good as ever, when performing the
|
||||
[aider code editing benchmark](/docs/benchmarks.html#the-benchmark)
|
||||
via the API.
|
||||
|
||||
Below is a graph showing the performance of Claude 3.5 Sonnet over time.
|
||||
It shows every clean, comparable benchmark run performed since Sonnet launched.
|
||||
|
@ -28,6 +25,9 @@ degradation.
|
|||
There is always some variance in benchmark results, typically +/- 2%
|
||||
between runs with identical prompts.
|
||||
|
||||
It's worth noting that these results would not capture any changes
|
||||
made to how Sonnet is presented in Anthropic's web chat UI.
|
||||
|
||||
<div class="chart-container" style="position: relative; height:400px; width:100%">
|
||||
<canvas id="sonnetPerformanceChart"></canvas>
|
||||
</div>
|
||||
|
@ -136,10 +136,10 @@ document.addEventListener('DOMContentLoaded', function() {
|
|||
});
|
||||
</script>
|
||||
|
||||
This graph shows the performance of Claude 3.5 Sonnet on the
|
||||
> This graph shows the performance of Claude 3.5 Sonnet on the
|
||||
[Aider's code editing benchmark](/docs/benchmarks.html#the-benchmark)
|
||||
over time. 'Pass Rate 1' represents the initial success rate, while 'Pass Rate 2' shows the success rate after a second attempt with a chance to fix testing errors.
|
||||
The
|
||||
[aider LLM code editing leaderboard](https://aider.chat/docs/leaderboards/)
|
||||
ranks models based on Pass Rate 2.
|
||||
> over time. 'Pass Rate 1' represents the initial success rate, while 'Pass Rate 2' shows the success rate after a second attempt with a chance to fix testing errors.
|
||||
> The
|
||||
> [aider LLM code editing leaderboard](https://aider.chat/docs/leaderboards/)
|
||||
> ranks models based on Pass Rate 2.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue