This commit is contained in:
Paul Gauthier 2024-05-24 07:07:14 -07:00
parent c591ecd331
commit 37c640bf69
2 changed files with 38 additions and 35 deletions

View file

@ -7,7 +7,7 @@ draft: true
# Aider scores 26.3% on SWE Bench Lite # Aider scores 26.3% on SWE Bench Lite
Aider scored 26.3% [Aider scored 26.3%](https://github.com/swe-bench/experiments/pull/7)
on the on the
[SWE Bench Lite benchmark](https://www.swebench.com), [SWE Bench Lite benchmark](https://www.swebench.com),
achieving a state-of-the-art result. achieving a state-of-the-art result.
@ -195,7 +195,7 @@ and it worked well for the SWE Bench problems.
Aider successfully identified the correct file to edit Aider successfully identified the correct file to edit
in 70.3% of the benchmark tasks. in 70.3% of the benchmark tasks.
We can determine which file needed to be edited using the "gold" patch We can determine which file needs to be edited using the "gold" patch
which is associated with each SWE Bench task. which is associated with each SWE Bench task.
This patch was created by a human developer This patch was created by a human developer
to solve the issue, and therefore reveals a file which can to solve the issue, and therefore reveals a file which can
@ -237,7 +237,7 @@ created a plausible solution.
## Linting and fixing ## Linting and fixing
Another key criteria for a plausible solution is that it passes basic Another key criteria for a plausible solution is that it passes basic
linting, which means that the code is valid and without syntax linting, which means that the code has no syntax
or other fatal errors. or other fatal errors.
[Aider lints code](https://aider.chat/2024/05/22/linting.html) [Aider lints code](https://aider.chat/2024/05/22/linting.html)
after every LLM edit and offers to automatically fix after every LLM edit and offers to automatically fix
@ -365,15 +365,16 @@ and prioritizing solutions in the following order:
## Computing the benchmark score ## Computing the benchmark score
The benchmark harness produces a candidate solution for each of the 300 The benchmark harness produced a plausible solution for each of the 300
SWE Bench Lite instances and saves it as the `model_patch`. SWE Bench Lite instances and saved it as the `model_patch`.
A separate evaluation script A separate evaluation script was used to
tests each of these solutions with the full test suite test each of these solutions with the full test suite,
including the held out acceptance tests. including the held out acceptance tests.
For this final acceptance testing, any edits that aider made to tests For this final acceptance testing, any edits that aider made to tests
are discarded. are discarded.
This ensures that the full, correct test suite is used for acceptance testing. This ensures that the correct,
unmodified test suite is used for acceptance testing.
The evaluation script compares the test results The evaluation script compares the test results
with results from testing with results from testing
the "gold" patch that was developed by a human to correctly solve the issue. the "gold" patch that was developed by a human to correctly solve the issue.

View file

@ -18,24 +18,24 @@
Aider works best with code that is part of a git repo. Aider works best with code that is part of a git repo.
Aider is tightly integrated with git, which makes it easy to: Aider is tightly integrated with git, which makes it easy to:
- Use git to undo any GPT changes that you don't like - Use git to undo any aider changes that you don't like
- Go back in the git history to review the changes GPT made to your code - Go back in the git history to review the changes that aider made to your code
- Manage a series of GPT's changes on a git branch - Manage a series of aider's changes on a git branch
Aider specifically uses git in these ways: Aider specifically uses git in these ways:
- It asks to create a git repo if you launch it in a directory without one. - It asks to create a git repo if you launch it in a directory without one.
- Whenever GPT edits a file, aider commits those changes with a descriptive commit message. This makes it easy to undo or review GPT's changes. - Whenever aider edits a file, it commits those changes with a descriptive commit message. This makes it easy to undo or review aider's changes.
- Aider takes special care if GPT tries to edit files that already have uncommitted changes (dirty files). Aider will first commit any preexisting changes with a descriptive commit message. This keeps your edits separate from GPT's edits, and makes sure you never lose your work if GPT makes an inappropriate change. - Aider takes special care before editing files that already have uncommitted changes (dirty files). Aider will first commit any preexisting changes with a descriptive commit message. This keeps your edits separate from aider's edits, and makes sure you never lose your work if aider makes an inappropriate change.
Aider also allows you to use in-chat commands to `/diff` or `/undo` the last change made by GPT. Aider also allows you to use in-chat commands to `/diff` or `/undo` the last change.
To do more complex management of your git history, you cat use raw `git` commands, To do more complex management of your git history, you cat use raw `git` commands,
either by using `/git` within the chat, or with standard git tools outside of aider. either by using `/git` within the chat, or with standard git tools outside of aider.
While it is not recommended, you can disable aider's use of git in a few ways: While it is not recommended, you can disable aider's use of git in a few ways:
- `--no-auto-commits` will stop aider from git committing each of GPT's changes. - `--no-auto-commits` will stop aider from git committing each of its changes.
- `--no-dirty-commits` will stop aider from committing dirty files before applying GPT's edits. - `--no-dirty-commits` will stop aider from committing dirty files before applying its edits.
- `--no-git` will completely stop aider from using git on your files. You should ensure you are keeping sensible backups of the files you are working with. - `--no-git` will completely stop aider from using git on your files. You should ensure you are keeping sensible backups of the files you are working with.
@ -151,20 +151,20 @@ coder = Coder.create(client=client, fnames=fnames, io=io)
## What code languages does aider support? ## What code languages does aider support?
Aider supports pretty much all the popular coding languages. Aider supports pretty much all the popular coding languages.
This is partly because GPT-4 is fluent in most mainstream languages, This is partly because top LLMs are fluent in most mainstream languages,
and familiar with popular libraries, packages and frameworks. and familiar with popular libraries, packages and frameworks.
In fact, coding with aider is sometimes the most magical In fact, coding with aider is sometimes the most magical
when you're working in a language that you when you're working in a language that you
are less familiar with. are less familiar with.
GPT often knows the language better than you, the LLM often knows the language better than you,
and can generate all the boilerplate to get to the heart of your and can generate all the boilerplate to get to the heart of your
problem. problem.
GPT will often solve your problem in an elegant way The LLM will often solve your problem in an elegant way
using a library or package that you weren't even aware of. using a library or package that you weren't even aware of.
Aider uses tree-sitter to do code analysis and help Aider uses tree-sitter to do code analysis and help
GPT navigate larger code bases by producing the LLM navigate larger code bases by producing
a [repository map](https://aider.chat/docs/repomap.html). a [repository map](https://aider.chat/docs/repomap.html).
Aider can currently produce repository maps for most mainstream languages, listed below. Aider can currently produce repository maps for most mainstream languages, listed below.
@ -212,28 +212,30 @@ pipx install aider-chat
## Aider isn't editing my files? ## Aider isn't editing my files?
Sometimes GPT will reply with some code changes that don't get applied to your local files. Sometimes the LLM will reply with some code changes that don't get applied to your local files.
In these cases, aider might say something like "Failed to apply edit to *filename*". In these cases, aider might say something like "Failed to apply edit to *filename*".
This usually happens because GPT is not specifying the edits This usually happens because the LLM is not specifying the edits
to make in the format that aider expects. to make in the format that aider expects.
GPT-3.5 is especially prone to disobeying the system prompt instructions in this manner, but it also happens with GPT-4. GPT-3.5 is especially prone to disobeying the system prompt instructions in this manner, but it also happens with stronger models.
Aider makes every effort to get GPT to conform, and works hard to deal with Aider makes every effort to get the LLM
to conform, and works hard to deal with
replies that are "almost" correctly formatted. replies that are "almost" correctly formatted.
If Aider detects an improperly formatted reply, it gives GPT feedback to try again. If Aider detects an improperly formatted reply, it gives
the LLM feedback to try again.
Also, before each release new versions of aider are Also, before each release new versions of aider are
[benchmarked](https://aider.chat/docs/benchmarks.html). [benchmarked](https://aider.chat/docs/benchmarks.html).
This helps prevent regressions in the code editing This helps prevent regressions in the code editing
performance of GPT that could have been inadvertantly performance of an LLM that could have been inadvertantly
introduced. introduced.
But sometimes GPT just won't cooperate. But sometimes the LLM just won't cooperate.
In these cases, here are some things you might try: In these cases, here are some things you might try:
- Try the older GPT-4 model `gpt-4-0613` not GPT-4 Turbo by running `aider --model gpt-4-0613`.
- Use `/drop` to remove files from the chat session which aren't needed for the task at hand. This will reduce distractions and may help GPT produce properly formatted edits. - Use `/drop` to remove files from the chat session which aren't needed for the task at hand. This will reduce distractions and may help GPT produce properly formatted edits.
- Use `/clear` to remove the conversation history, again to help GPT focus. - Use `/clear` to remove the conversation history, again to help GPT focus.
- Try the a different LLM.
## How can I add ALL the files to the chat? ## How can I add ALL the files to the chat?
@ -244,16 +246,16 @@ The best approach is think about which files need to be changed to accomplish
the task you are working on. Just add those files to the chat. the task you are working on. Just add those files to the chat.
Usually when people want to add "all the files" it's because they think it Usually when people want to add "all the files" it's because they think it
will give GPT helpful context about the overall code base. will give the LLM helpful context about the overall code base.
Aider will automatically give GPT a bunch of additional context about Aider will automatically give the LLM a bunch of additional context about
the rest of your git repo. the rest of your git repo.
It does this by analyzing your entire codebase in light of the It does this by analyzing your entire codebase in light of the
current chat to build a compact current chat to build a compact
[repository map](https://aider.chat/2023/10/22/repomap.html). [repository map](https://aider.chat/2023/10/22/repomap.html).
Adding a bunch of files that are mostly irrelevant to the Adding a bunch of files that are mostly irrelevant to the
task at hand will often distract or confuse GPT. task at hand will often distract or confuse the LLM.
GPT will give worse coding results, and sometimese even fail to correctly edit files. The LLM will give worse coding results, and sometimese even fail to correctly edit files.
Addings extra files will also increase the token costs on your OpenAI invoice. Addings extra files will also increase the token costs on your OpenAI invoice.
Again, it's usually best to just add the files to the chat that will need to be modified. Again, it's usually best to just add the files to the chat that will need to be modified.
@ -265,7 +267,7 @@ If you still wish to add lots of files to the chat, you can:
## Can I specify guidelines or conventions? ## Can I specify guidelines or conventions?
Sometimes you want GPT to be aware of certain coding guidelines, Sometimes you want the LLM to be aware of certain coding guidelines,
like whether to provide type hints, which libraries or packages like whether to provide type hints, which libraries or packages
to prefer, etc. to prefer, etc.
@ -298,7 +300,7 @@ The wholefile coder is currently used by GPT-3.5 by default. You can manually se
- wholefile_coder.py - wholefile_coder.py
- wholefile_prompts.py - wholefile_prompts.py
The editblock coder is currently used by GPT-4 by default. You can manually select it with `--edit-format diff`. The editblock coder is currently used by GPT-4o by default. You can manually select it with `--edit-format diff`.
- editblock_coder.py - editblock_coder.py
- editblock_prompts.py - editblock_prompts.py
@ -309,7 +311,7 @@ The universal diff coder is currently used by GPT-4 Turbo by default. You can ma
- udiff_prompts.py - udiff_prompts.py
When experimenting with coder backends, it helps to run aider with `--verbose --no-pretty` so you can see When experimenting with coder backends, it helps to run aider with `--verbose --no-pretty` so you can see
all the raw information being sent to/from GPT in the conversation. all the raw information being sent to/from the LLM in the conversation.
You can also refer to the You can also refer to the
[instructions for installing a development version of aider](https://aider.chat/docs/install.html#install-development-versions-of-aider-optional). [instructions for installing a development version of aider](https://aider.chat/docs/install.html#install-development-versions-of-aider-optional).