mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-01 18:25:00 +00:00
added together_ai/qwen/Qwen2-72B-Instruct data
This commit is contained in:
parent
02c7335aa7
commit
86ea47f791
2 changed files with 23 additions and 14 deletions
|
@ -474,4 +474,26 @@
|
||||||
versions: 0.28.1-dev
|
versions: 0.28.1-dev
|
||||||
seconds_per_case: 17.6
|
seconds_per_case: 17.6
|
||||||
total_cost: 1.6205
|
total_cost: 1.6205
|
||||||
|
|
||||||
|
- dirname: 2024-06-08-22-37-55--qwen2-72b-instruct-whole
|
||||||
|
test_cases: 133
|
||||||
|
model: Qwen2 72B Instruct
|
||||||
|
edit_format: whole
|
||||||
|
commit_hash: 02c7335-dirty, 1a97498-dirty
|
||||||
|
pass_rate_1: 44.4
|
||||||
|
pass_rate_2: 55.6
|
||||||
|
percent_cases_well_formed: 100.0
|
||||||
|
error_outputs: 3
|
||||||
|
num_malformed_responses: 0
|
||||||
|
num_with_malformed_responses: 0
|
||||||
|
user_asks: 3
|
||||||
|
lazy_comments: 0
|
||||||
|
syntax_errors: 0
|
||||||
|
indentation_errors: 0
|
||||||
|
exhausted_context_windows: 0
|
||||||
|
test_timeouts: 1
|
||||||
|
command: aider --model together_ai/qwen/Qwen2-72B-Instruct
|
||||||
|
date: 2024-06-08
|
||||||
|
versions: 0.37.1-dev
|
||||||
|
seconds_per_case: 14.3
|
||||||
|
total_cost: 0.0000
|
|
@ -15,19 +15,6 @@ The leaderboards below report the results from a number of popular LLMs.
|
||||||
While [aider can connect to almost any LLM](/docs/llms.html),
|
While [aider can connect to almost any LLM](/docs/llms.html),
|
||||||
it works best with models that score well on the benchmarks.
|
it works best with models that score well on the benchmarks.
|
||||||
|
|
||||||
## GPT-4o takes the #1 & #2 spots
|
|
||||||
|
|
||||||
GPT-4o tops the aider LLM code editing leaderboard at 72.9%, versus 68.4% for Opus. GPT-4o takes second on aider's refactoring leaderboard with 62.9%, versus Opus at 72.3%.
|
|
||||||
|
|
||||||
GPT-4o did much better than the 4-turbo models, and seems *much* less lazy.
|
|
||||||
|
|
||||||
GPT-4o is also able to use aider's established "diff" edit format that uses
|
|
||||||
`SEARCH/REPLACE` blocks.
|
|
||||||
This diff format is used by all the other capable models, including Opus and
|
|
||||||
the original GPT-4 models
|
|
||||||
The GPT-4 Turbo models have all required the "udiff" edit format, due to their
|
|
||||||
tendancy to lazy coding.
|
|
||||||
|
|
||||||
|
|
||||||
## Code editing leaderboard
|
## Code editing leaderboard
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue