copy

2025-05-31 17:55:01 +00:00 · 2024-06-20 09:56:18 -07:00 · 2024-06-20 09:56:18 -07:00 · 559279c781
commit 559279c781
parent e5e07f9507
2 changed files with 25 additions and 2 deletions
--- a/website/_data/refactor_leaderboard.yml
+++ b/website/_data/refactor_leaderboard.yml
@ -143,4 +143,25 @@
  seconds_per_case: 67.8
  total_cost: 20.4889

-  
+
+- dirname: 2024-06-20-16-39-18--refac-claude-3.5-sonnet-diff
+  test_cases: 89
+  model: claude-3.5-sonnet (diff)
+  edit_format: diff
+  commit_hash: e5e07f9
+  pass_rate_1: 55.1
+  percent_cases_well_formed: 70.8
+  error_outputs: 240
+  num_malformed_responses: 54
+  num_with_malformed_responses: 26
+  user_asks: 10
+  lazy_comments: 2
+  syntax_errors: 0
+  indentation_errors: 3
+  exhausted_context_windows: 0
+  test_timeouts: 0
+  command: aider --model openrouter/anthropic/claude-3.5-sonnet
+  date: 2024-06-20
+  versions: 0.38.1-dev
+  seconds_per_case: 51.9
+  total_cost: 0.0000
--- a/website/docs/leaderboards/index.md
+++ b/website/docs/leaderboards/index.md
@ -19,7 +19,9 @@ it works best with models that score well on the benchmarks.
 ## Claude 3.5 Sonnet takes the top spot

 Claude 3.5 Sonnet is now the top ranked model on aider's code editing leaderboard.
-DeepSeek Coder V2 took the #1 spot only 4 days ago.
+DeepSeek Coder V2 only spent 4 days in the top spot.
+
+The new Sonnet came in 3rd on aider's refactoring leaderboard, behind GPT-4o and Opus.

 Sonnet ranked #1 when using the "whole" editing format,
 but it also scored very well with