mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-31 09:44:59 +00:00
copy
This commit is contained in:
parent
3e715e66d5
commit
3dc3ebe25f
1 changed files with 13 additions and 14 deletions
|
@ -3,8 +3,20 @@ highlight_image: /assets/leaderboard.jpg
|
|||
nav_order: 950
|
||||
---
|
||||
|
||||
# Aider LLM Leaderboards
|
||||
|
||||
# Deepseek Coder V2 beats GPT-4o, Opus on Aider Code Editing Leaderboard
|
||||
Aider works best with LLMs which are good at *editing* code, not just good at writing
|
||||
code.
|
||||
To evaluate an LLM's editing skill, aider uses a pair of benchmarks that
|
||||
assess a model's ability to consistently follow the system prompt
|
||||
to successfully edit code.
|
||||
|
||||
The leaderboards below report the results from a number of popular LLMs.
|
||||
While [aider can connect to almost any LLM](/docs/llms.html),
|
||||
it works best with models that score well on the benchmarks.
|
||||
|
||||
|
||||
## Deepseek Coder V2 beats GPT-4o, Opus
|
||||
|
||||
The new
|
||||
[Deepseek Coder V2](https://aider.chat/docs/llms/deepseek.html)
|
||||
|
@ -21,19 +33,6 @@ These output limits are often as low as 4k tokens, even for models
|
|||
with very large context windows.
|
||||
|
||||
|
||||
## Aider LLM Leaderboards
|
||||
|
||||
Aider works best with LLMs which are good at *editing* code, not just good at writing
|
||||
code.
|
||||
To evaluate an LLM's editing skill, aider uses a pair of benchmarks that
|
||||
assess a model's ability to consistently follow the system prompt
|
||||
to successfully edit code.
|
||||
|
||||
The leaderboards below report the results from a number of popular LLMs.
|
||||
While [aider can connect to almost any LLM](/docs/llms.html),
|
||||
it works best with models that score well on the benchmarks.
|
||||
|
||||
|
||||
## Code editing leaderboard
|
||||
|
||||
[Aider's code editing benchmark](/docs/benchmarks.html#the-benchmark) asks the LLM to edit python source files to complete 133 small coding exercises. This benchmark measures the LLM's coding ability, but also whether it can consistently emit code edits in the format specified in the system prompt.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue