mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-31 01:35:00 +00:00
docs: Expand details on Sonnet performance graph
This commit is contained in:
parent
7c9cae63cc
commit
9ad41e9229
1 changed files with 9 additions and 2 deletions
|
@ -19,7 +19,12 @@ dumbed-down, nerfed or otherwise performing worse lately.
|
|||
Sonnet seems as good as ever, at least when accessed via
|
||||
the API.
|
||||
|
||||
Here's a graph showing the performance of Claude 3.5 Sonnet over time:
|
||||
Here's a graph showing the performance of Claude 3.5 Sonnet over time.
|
||||
It shows every benchmark run performed since Sonnet launched.
|
||||
Benchmarks were performed for various reasons, usually
|
||||
to evaluate the effects of small changes to aider's system prompts.
|
||||
There is always some variance in benchmark results, typically +/- 1-2%
|
||||
between runs with identical prompts.
|
||||
|
||||
<div class="chart-container" style="position: relative; height:400px; width:100%">
|
||||
<canvas id="sonnetPerformanceChart"></canvas>
|
||||
|
@ -87,6 +92,8 @@ document.addEventListener('DOMContentLoaded', function() {
|
|||
});
|
||||
</script>
|
||||
|
||||
This graph shows the performance of Claude 3.5 Sonnet on the aider code editing benchmark over time. 'Pass Rate 1' represents the initial success rate, while 'Pass Rate 2' shows the success rate after a second attempt. As you can see, there's no significant decline in performance, suggesting that Sonnet's capabilities have remained stable since its launch.
|
||||
This graph shows the performance of Claude 3.5 Sonnet on the
|
||||
[aider code editing benchmark](https://aider.chat/docs/leaderboards/)
|
||||
over time. 'Pass Rate 1' represents the initial success rate, while 'Pass Rate 2' shows the success rate after a second attempt with a chance to fix testing errors.
|
||||
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue