This commit is contained in:
Paul Gauthier 2024-05-04 16:25:22 -07:00
parent b74edcf350
commit 812a620711
2 changed files with 19 additions and 15 deletions

View file

@ -3,18 +3,20 @@
Aider works best with LLMs which are good at *editing* code, not just good at writing
code.
Aider works with the LLM to make changes to the existing code in your git repo,
so the LLM needs to be capable of reliably specifying how to edit code.
Aider uses the system prompt to tell the LLM how to specify edits to the existing code
in your local git repo.
Some LLMs are better than others at consistently following these instructions
to successfully edit code.
Aider uses two benchmarks
to measure an LLM's code editing ability:
- The [code editing benchmark](https://aider.chat/docs/benchmarks.html#the-benchmark) asks the LLM to edit python source files to complete 133 Exercism exercises. This benchmark measures the LLM's ability to emit code edits according to the format aider specifies in the system prompt.
- The [refactoring benchmark](https://github.com/paul-gauthier/refactor-benchmark) asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections. It was developed to provoke and measure GPT-4 Turbo's "lazy coding" habit.
- The [code editing benchmark](/docs/benchmarks.html#the-benchmark) asks the LLM to edit python source files to complete 133 small coding exercises. This benchmark measures the LLM's coding ability, but also whether it can consistently emit code edits in the format specified in the system prompt.
- The [refactoring benchmark](https://github.com/paul-gauthier/refactor-benchmark) asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections or making mistakes. It was developed to provoke and measure [GPT-4 Turbo's "lazy coding" habit](/2023/12/21/unified-diffs.html).
The leaderboards below report the results from a number of popular LLMs,
to help users select which models to use with aider.
While [aider can connect to almost any LLM](https://aider.chat/docs/llms.html)
While [aider can connect to almost any LLM](/docs/llms.html)
it will work best with models that score well on the benchmarks.
## Code editing leaderboard
@ -162,7 +164,7 @@ Models that use a diff-like format are able to
edit larger files with less cost and without hitting token limits.
Aider is configured to use the best edit format for the popular OpenAI and Anthropic models
and the [other models recommended on the LLM page](https://aider.chat/docs/llms.html).
and the [other models recommended on the LLM page](/docs/llms.html).
For lesser known models aider will default to using the "whole" editing format
since it is the easiest format for an LLM to use.

View file

@ -419,17 +419,19 @@ in case you made a typo or mistake when specifying the model name.
## Editing format
Aider uses different "edit formats" to collect code edits from different LLMs:
Aider uses different "edit formats" to collect code edits from different LLMs.
The "whole" format is the easiest for an LLM to use, but it uses a lot of tokens
and may limit how large a file can be edited.
Models which can use one of the diff formats are much more efficient,
using far fewer tokens.
Models that use a diff-like format are able to
edit larger files with less cost and without hitting token limits.
- `whole` is a "whole file" editing format, where the model edits a file by returning a full new copy of the file with any changes included.
- `diff` is a more efficient diff style format, where the model specifies blocks of code to search and replace in order to made changes to files.
- `diff-fenced` is similar to diff, but fences the entire diff block including the filename.
- `udiff` is the most efficient editing format, where the model returns unified diffs to apply changes to the file.
Aider is configured to use the best edit format for the popular OpenAI and Anthropic models
and the [other models recommended on the LLM page](https://aider.chat/docs/llms.html).
For lesser known models aider will default to using the "whole" editing format
since it is the easiest format for an LLM to use.
Different models work best with different editing formats.
Aider is configured to use the best edit format for the popular OpenAI and Anthropic models and the other models recommended on this page.
For lesser known models aider will default to using the "whole" editing format.
If you would like to experiment with the more advanced formats, you can
use these switches: `--edit-format diff` or `--edit-format udiff`.