This commit is contained in:
Paul Gauthier 2024-06-20 09:56:18 -07:00
parent e5e07f9507
commit 559279c781
2 changed files with 25 additions and 2 deletions

View file

@ -143,4 +143,25 @@
seconds_per_case: 67.8 seconds_per_case: 67.8
total_cost: 20.4889 total_cost: 20.4889
- dirname: 2024-06-20-16-39-18--refac-claude-3.5-sonnet-diff
test_cases: 89
model: claude-3.5-sonnet (diff)
edit_format: diff
commit_hash: e5e07f9
pass_rate_1: 55.1
percent_cases_well_formed: 70.8
error_outputs: 240
num_malformed_responses: 54
num_with_malformed_responses: 26
user_asks: 10
lazy_comments: 2
syntax_errors: 0
indentation_errors: 3
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model openrouter/anthropic/claude-3.5-sonnet
date: 2024-06-20
versions: 0.38.1-dev
seconds_per_case: 51.9
total_cost: 0.0000

View file

@ -19,7 +19,9 @@ it works best with models that score well on the benchmarks.
## Claude 3.5 Sonnet takes the top spot ## Claude 3.5 Sonnet takes the top spot
Claude 3.5 Sonnet is now the top ranked model on aider's code editing leaderboard. Claude 3.5 Sonnet is now the top ranked model on aider's code editing leaderboard.
DeepSeek Coder V2 took the #1 spot only 4 days ago. DeepSeek Coder V2 only spent 4 days in the top spot.
The new Sonnet came in 3rd on aider's refactoring leaderboard, behind GPT-4o and Opus.
Sonnet ranked #1 when using the "whole" editing format, Sonnet ranked #1 when using the "whole" editing format,
but it also scored very well with but it also scored very well with