From 86ea47f79169be5af7071664121bda162ad799d2 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Sat, 8 Jun 2024 16:43:28 -0700 Subject: [PATCH] added together_ai/qwen/Qwen2-72B-Instruct data --- website/_data/edit_leaderboard.yml | 24 +++++++++++++++++++++++- website/docs/leaderboards/index.md | 13 ------------- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/website/_data/edit_leaderboard.yml b/website/_data/edit_leaderboard.yml index 0d90394b3..e9f119313 100644 --- a/website/_data/edit_leaderboard.yml +++ b/website/_data/edit_leaderboard.yml @@ -474,4 +474,26 @@ versions: 0.28.1-dev seconds_per_case: 17.6 total_cost: 1.6205 - \ No newline at end of file + +- dirname: 2024-06-08-22-37-55--qwen2-72b-instruct-whole + test_cases: 133 + model: Qwen2 72B Instruct + edit_format: whole + commit_hash: 02c7335-dirty, 1a97498-dirty + pass_rate_1: 44.4 + pass_rate_2: 55.6 + percent_cases_well_formed: 100.0 + error_outputs: 3 + num_malformed_responses: 0 + num_with_malformed_responses: 0 + user_asks: 3 + lazy_comments: 0 + syntax_errors: 0 + indentation_errors: 0 + exhausted_context_windows: 0 + test_timeouts: 1 + command: aider --model together_ai/qwen/Qwen2-72B-Instruct + date: 2024-06-08 + versions: 0.37.1-dev + seconds_per_case: 14.3 + total_cost: 0.0000 \ No newline at end of file diff --git a/website/docs/leaderboards/index.md b/website/docs/leaderboards/index.md index 097c542f1..8d0f189e7 100644 --- a/website/docs/leaderboards/index.md +++ b/website/docs/leaderboards/index.md @@ -15,19 +15,6 @@ The leaderboards below report the results from a number of popular LLMs. While [aider can connect to almost any LLM](/docs/llms.html), it works best with models that score well on the benchmarks. -## GPT-4o takes the #1 & #2 spots - -GPT-4o tops the aider LLM code editing leaderboard at 72.9%, versus 68.4% for Opus. GPT-4o takes second on aider's refactoring leaderboard with 62.9%, versus Opus at 72.3%. - -GPT-4o did much better than the 4-turbo models, and seems *much* less lazy. - -GPT-4o is also able to use aider's established "diff" edit format that uses -`SEARCH/REPLACE` blocks. -This diff format is used by all the other capable models, including Opus and -the original GPT-4 models -The GPT-4 Turbo models have all required the "udiff" edit format, due to their -tendancy to lazy coding. - ## Code editing leaderboard