This commit is contained in:
Paul Gauthier 2024-06-20 09:56:18 -07:00
parent e5e07f9507
commit 559279c781
2 changed files with 25 additions and 2 deletions

View file

@ -143,4 +143,25 @@
seconds_per_case: 67.8
total_cost: 20.4889
- dirname: 2024-06-20-16-39-18--refac-claude-3.5-sonnet-diff
test_cases: 89
model: claude-3.5-sonnet (diff)
edit_format: diff
commit_hash: e5e07f9
pass_rate_1: 55.1
percent_cases_well_formed: 70.8
error_outputs: 240
num_malformed_responses: 54
num_with_malformed_responses: 26
user_asks: 10
lazy_comments: 2
syntax_errors: 0
indentation_errors: 3
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model openrouter/anthropic/claude-3.5-sonnet
date: 2024-06-20
versions: 0.38.1-dev
seconds_per_case: 51.9
total_cost: 0.0000

View file

@ -19,7 +19,9 @@ it works best with models that score well on the benchmarks.
## Claude 3.5 Sonnet takes the top spot
Claude 3.5 Sonnet is now the top ranked model on aider's code editing leaderboard.
DeepSeek Coder V2 took the #1 spot only 4 days ago.
DeepSeek Coder V2 only spent 4 days in the top spot.
The new Sonnet came in 3rd on aider's refactoring leaderboard, behind GPT-4o and Opus.
Sonnet ranked #1 when using the "whole" editing format,
but it also scored very well with