![]() # Aider chat conversation: USER: proofread this and make corrections. don't change style, but fix problems. ASSISTANT: _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Aider scored 26.3% on the [SWE Bench Lite benchmark](https://www.swebench.com), achieving a state of the art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere online seems to be ======= Aider scored 26.3% on the [SWE Bench Lite benchmark](https://www.swebench.com), achieving a state-of-the-art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere online seems to be >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Aider achieved this result mainly through its focus on static code analysis, reliable LLM code editing and pragmatic workflows for interactive pair programming with AI. Aider intentionally has quite limited and narrow "agentic behavior": it doesn't require a highly detailed upfront "spec" from ^C KeyboardInterrupt USER: don't fix formatting like that. linebreaks are fine. it will render properly. ASSISTANT: Understood. I will only fix grammatical and spelling errors without changing the formatting. _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Aider scored 26.3% on the [SWE Bench Lite benchmark](https://www.swebench.com), achieving a state of the art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere online seems to be [22.3% from AutoCodeRover](https://github.com/nus-apr/auto-code-rover). ======= Aider scored 26.3% on the [SWE Bench Lite benchmark](https://www.swebench.com), achieving a state-of-the-art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere online seems to be [22.3% from AutoCodeRover](https://github.com/nus-apr/auto-code-rover). >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Aider achieved this result mainly through its focus on static code analysis, reliable LLM code editing and pragmatic workflows for interactive pair programming with AI. Aider intentionally has quite limited and narrow "agentic behavior": it doesn't require a highly detailed upfront "spec" from the user, use RAG or vector search, farm out sub-problems to an army of LLMs, allow the LLM to use tools or perform web searches, ======= Aider achieved this result mainly through its focus on static code analysis, reliable LLM code editing, and pragmatic workflows for interactive pair programming with AI. Aider intentionally has quite limited and narrow "agentic behavior": it doesn't require a highly detailed upfront "spec" from the user, use RAG or vector search, farm out sub-problems to an army of LLMs, allow the LLM to use tools, or perform web searches, >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH When a user asks aider for a change, they see the edits performed in real-time and aider may also then offer additional help like fixing lint or test errors. In normal use, the user is in full interactive control. This lets them quickly steer misunderstandings back on course and ======= When a user asks aider for a change, they see the edits performed in real-time, and aider may also then offer additional help like fixing lint or test errors. In normal use, the user is in full interactive control. This lets them quickly steer misunderstandings back on course and >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH For the benchmark, aider was launched in each problem's git repository with the problem statement submitted as the opening chat message from "the user". After that aider runs as normal, with the following modifications: ======= For the benchmark, aider was launched in each problem's git repository with the problem statement submitted as the opening chat message from "the user." After that aider runs as normal, with the following modifications: >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH - A simple harness was used to retry the SWE Bench problem if aider produced code which wasn't *plausibly correct*. ======= - A simple harness was used to retry the SWE Bench problem if aider produced code that wasn't *plausibly correct*. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH - If the solution isn't plausible, the harness launches aider to try again from scratch ======= - If the solution isn't plausible, the harness launches aider to try again from scratch, >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH - If aider doesn't produce code that lints and tests clean, the user might decide to revert the changes and try again. Maybe with a different LLM this time. ======= - If aider doesn't produce code that lints and tests clean, the user might decide to revert the changes and try again, maybe with a different LLM this time. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Of course, outside a benchmark setting it's probably ======= Of course, outside a benchmark setting, it's probably >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Running the entire SWE Bench Lite benchmark using aider with just GPT-4o achieved a score of 25%. This was itself a state of the art result, before being surpassed by the main result being reported here that used aider with both GPT-4o & Opus. ======= Running the entire SWE Bench Lite benchmark using aider with just GPT-4o achieved a score of 25%. This was itself a state-of-the-art result, before being surpassed by the main result being reported here that used aider with both GPT-4o & Opus. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH The table below breaks down the 79 solutions which were ultimately ======= The table below breaks down the 79 solutions that were ultimately >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH - Aider with GPT-4o immediately found 77% of the valid solutions on the first attempt. ======= - Aider with GPT-4o immediately found 77% of the valid solutions on the first attempt. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH If we breakdown correct solutions purely by model, ======= If we break down correct solutions purely by model, >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Aider instead uses a [repository map](https://aider.chat/2023/10/22/repomap.html) to help the LLM understand the layout, code structure and content of a git repo. The repo map is created from the code's AST and call graph to provide a compact and powerful summary of the entire code base. The map is constantly tailored to show repo context that is relevant to the current state of the chat conversation. This is done by performing a graph optimization on the code's call graph. ======= Aider instead uses a [repository map](https://aider.chat/2023/10/22/repomap.html) to help the LLM understand the layout, code structure, and content of a git repo. The repo map is created from the code's AST and call graph to provide a compact and powerful summary of the entire code base. The map is constantly tailored to show repo context that is relevant to the current state of the chat conversation. This is done by performing a graph optimization on the code's call graph. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH When the user asks for a change to their code, the LLM uses the repo map to decide which files to edit. The LLM simply returns a normal text response explaining which files it needs to edit and why. Aider notices when the LLM mentions filenames from the repo, and asks the user if they should be added to the chat. Adding a file to the chat allows the LLM to see the full contents of the file and edit it. ======= When the user asks for a change to their code, the LLM uses the repo map to decide which files to edit. The LLM simply returns a normal text response explaining which files it needs to edit and why. Aider notices when the LLM mentions filenames from the repo, and asks the user if they should be added to the chat. Adding a file to the chat allows the LLM to see the full contents of the file and edit it. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Aider has always had a deep focus on ensuring that LLMs can not just write code, but reliably *edit* code. Aider a collection of prompting strategies and code editing backends which have ======= Aider has always had a deep focus on ensuring that LLMs can not just write code, but reliably *edit* code. Aider has a collection of prompting strategies and code editing backends which have >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH [Aider lints code]() after every LLM edit, and offers to automatically fix ======= [Aider lints code](https://aider.chat/docs/linting/) after every LLM edit and offers to automatically fix >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Aider includes basic linters built with tree-sitter that support ======= Aider includes basic linters built with tree-sitter that supports >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH This context increases the ability of the LLM to understand the problem and ======= This context increases the ability of the LLM to understand the problem and >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH A user working on a python project might do that by launching ======= A user working on a Python project might do that by launching >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH The repositories that are used in the SWE Bench problems are large open ======= The repositories that are used in the SWE Bench problems are large, open >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH A repo's test suite can be run in three ways: ======= A repo's test suite can be run in three ways: >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH 2. Run tests after aider has modified the repo. ======= 2. Run tests after aider has modified the repo. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH 3. Run the final "acceptance tests" to judge if the coding agent has ======= 3. Run the final "acceptance tests" to judge if the coding agent has >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH For the benchmark, aider is configured with a test command that will run the tests ======= For the benchmark, aider is configured with a test command that will run the tests >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH When aider runs a test command, it checks for a non-zero exit status. ======= When aider runs a test command, it checks for a non-zero exit status. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH To be clear, *aider can not run or even see the "acceptance tests"* from the `test_patch` ======= To be clear, *aider cannot run or even see the "acceptance tests"* from the `test_patch` >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH As aider executes, it notes the outcome of the editing, linting and testing steps. When aider completes, it returns their final status as either: succeeded with no errors remaining, or ended without resolving all errors. ======= As aider executes, it notes the outcome of the editing, linting, and testing steps. When aider completes, it returns its final status as either: succeeded with no errors remaining, or ended without resolving all errors. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH The benchmark harness uses these outcomes to determine if it has a plausible solution to the current SWE Bench task. A plausible solution is one where aider returns saying that it edited the repo with no outstanding edit, lint or test errors. ======= The benchmark harness uses these outcomes to determine if it has a plausible solution to the current SWE Bench task. A plausible solution is one where aider returns saying that it edited the repo with no outstanding edit, lint, or test errors. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH If the solution is not plausible, another instance of aider is launched again from scratch on the same problem. ======= If the solution is not plausible, another instance of aider is launched again from scratch on the same problem. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH It's worth noting that repositories may have lint or test errors present before aider even starts to edit them. Whether errors are caused by aider or were pre-existing, there will be instances where, after six tries, no plausible solution is obtained. ======= It's worth noting that repositories may have lint or test errors present before aider even starts to edit them. Whether errors are caused by aider or were pre-existing, there will be instances where, after six tries, no plausible solution is obtained. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH If all six attempts fail to produce a plausible solution, then the "best" solution available is selected as a the ======= If all six attempts fail to produce a plausible solution, then the "best" solution available is selected as the >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Which of the non-plausible solutions to use is determined ======= Which of the non-plausible solutions to use is determined >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH The benchmark harness produces one "best" solution for each of the 300 SWE Bench Lite instances, and saves it as a `model_patch`. A separate evaluation script uses the SWE Bench support code to test each of these results with the acceptance tests. ======= The benchmark harness produces one "best" solution for each of the 300 SWE Bench Lite instances and saves it as a `model_patch`. A separate evaluation script uses the SWE Bench support code to test each of these results with the acceptance tests. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH These `test_patch` acceptance tests are only ever run outside of aider and the benchmark harness, and only to compute the number of correctly resolved instances. They are never run, used or even visible during the attempts to solve the problems. ======= These `test_patch` acceptance tests are only ever run outside of aider and the benchmark harness, and only to compute the number of correctly resolved instances. They are never run, used, or even visible during the attempts to solve the problems. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Aider correctly resolved 79 out of 300 SWE Bench Lite instances, or 26.3%. ======= Aider correctly resolved 79 out of 300 SWE Bench Lite instances, or 26.3%. >>>>>>> REPLACE </source> _posts/2024-05-22-swe-bench-lite.md <source>markdown <<<<<<< SEARCH Much thanks to the team behind the [SWE Bench](https://www.swebench.com) family of AI coding benchmarks. Also thanks to Albert Örwall who has [dockerized the SWE Bench evaluation scripts](SWE-bench-docker) making it faster, easier and more reliable to run the acceptance tests. ======= Much thanks to the team behind the [SWE Bench](https://www.swebench.com) family of AI coding benchmarks. Also thanks to Albert Örwall who has [dockerized the SWE Bench evaluation scripts](SWE-bench-docker) making it faster, easier, and more reliable to run the acceptance tests. >>>>>>> REPLACE </source> |
||
---|---|---|
.github | ||
_data | ||
_layouts | ||
_posts | ||
aider | ||
assets | ||
benchmark | ||
blog | ||
docker | ||
docs | ||
examples | ||
scripts | ||
share | ||
tests | ||
.dockerignore | ||
.flake8 | ||
.gitignore | ||
.pre-commit-config.yaml | ||
_config.yml | ||
CNAME | ||
CONTRIBUTING.md | ||
dev-requirements.in | ||
dev-requirements.txt | ||
Gemfile | ||
HISTORY.md | ||
LICENSE.txt | ||
MANIFEST.in | ||
pytest.ini | ||
README.md | ||
requirements.in | ||
requirements.txt | ||
setup.py |
aider is AI pair programming in your terminal
Aider is a command line tool that lets you pair program with LLMs, to edit code stored in your local git repository. Aider will directly edit the code in your local source files, and git commit the changes with sensible commit messages. You can start a new project or work with an existing git repo. Aider is unique in that it lets you ask for changes to pre-existing, larger codebases. Aider works well with GPT-4o, Claude 3 Opus, GPT-3.5 and supports connecting to almost any LLM.
- Getting started
- Features
- Usage
- Tutorial videos
- In-chat commands
- Tips
- Installation
- Connecting to LLMs
- LLM Leaderboards
- Voice-to-code
- Example chat transcripts
- FAQ
- Discord
- Blog
Getting started
See the installation instructions for more details, but you can get started quickly like this:
$ pip install aider-chat
# To work with GPT-4o
$ export OPENAI_API_KEY=your-key-goes-here
$ aider
# To work with Claude 3 Opus:
$ export ANTHROPIC_API_KEY=your-key-goes-here
$ aider --opus
Features
- Chat with aider about your code by running
aider <file1> <file2> ...
from the command line with set of source files to discuss and edit together. Aider lets the LLM see and edit the content of those files. - Aider can write and edit code in most popular languages: python, javascript, typescript, php, html, css, etc.
- Aider works well with GPT-4o, Claude 3 Opus, GPT-3.5 and supports connecting to almost any LLM.
- Request new features, changes, improvements, or bug fixes to your code. Ask for new test cases, updated documentation or code refactors.
- Aider will apply the edits suggested by the LLM directly to your source files.
- Aider will automatically commit each changeset to your local git repo with a descriptive commit message. These frequent, automatic commits provide a safety net. It's easy to undo changes or use standard git workflows to manage longer sequences of changes.
- You can use aider with multiple source files at once, so aider can make coordinated code changes across all of them in a single changeset/commit.
- Aider can give the LLM a map of your entire git repo, which helps it understand and modify large codebases.
- You can also edit files by hand using your editor while chatting with aider. Aider will notice these out-of-band edits and keep up to date with the latest versions of your files. This lets you bounce back and forth between the aider chat and your editor, to collaboratively code with an LLM.
- You can add images files to your chat if you are working with a vision capable OpenAI model (GPT-4o, GPT-4 Turbo, etc).
Usage
Run aider
with the source code files you want to edit.
These files will be "added to the chat session", so that the LLM can see their
contents and edit them according to your instructions.
aider <file1> <file2> ...
Be selective, and just add the files that the LLM will need to edit. If you add a bunch of unrelated files, the LLM can get overwhelmed and confused (and it costs more tokens). Aider will automatically share snippets from other, related files with the LLM so it can understand the rest of your code base.
You can also just launch aider anywhere in a git repo without naming
files on the command line. It will discover all the files in the
repo. You can then add and remove individual files in the chat
session with the /add
and /drop
chat commands described below.
If you or the LLM mention any of the repo's filenames in the conversation,
aider will ask if you'd like to add them to the chat.
Aider also has many other options which can be set with
command line switches, environment variables or via a configuration file.
See aider --help
for details.
In-chat commands
Aider supports commands from within the chat, which all start with /
. Here are some of the most useful in-chat commands:
/add <file>
: Add matching files to the chat session, including image files./drop <file>
: Remove matching files from the chat session./undo
: Undo the last git commit if it was done by aider./diff
: Display the diff of the last aider commit./run <command>
: Run a shell command and optionally add the output to the chat./voice
: Speak to aider to request code changes with your voice./help
: Show help about all commands.
See the full command docs for more information.
Tips
- Think about which files need to be edited to make your change and add them to the chat. Aider can help the LLM figure out which files to edit all by itself, but the most efficient approach is to add the needed files to the chat yourself.
- Large changes are best performed as a sequence of thoughtful bite sized steps, where you plan out the approach and overall design. Walk the LLM through changes like you might with a junior dev. Ask for a refactor to prepare, then ask for the actual change. Spend the time to ask for code quality/structure improvements.
- Use Control-C to safely interrupt the LLM if it isn't providing a useful response. The partial response remains in the conversation, so you can refer to it when you reply to the LLM with more information or direction.
- Use the
/run
command to run tests, linters, etc and show the output to the LLM so it can fix any issues. - Use Meta-ENTER (Esc+ENTER in some environments) to enter multiline chat messages. Or enter
{
alone on the first line to start a multiline message and}
alone on the last line to end it. - If your code is throwing an error, share the error output with the LLM using
/run
or by pasting it into the chat. Let the LLM figure out and fix the bug. - LLMs know about a lot of standard tools and libraries, but may get some of the fine details wrong about APIs and function arguments. You can paste doc snippets into the chat to resolve these issues.
- The LLM can only see the content of the files you specifically "add to the chat". Aider also sends a map of your entire git repo. So the LLM may ask to see additional files if it feels that's needed for your requests.
Example chat transcripts
The example transcripts page shows how you can chat with aider to write and edit code.
Installation
See the installation instructions.
FAQ
For more information, see the FAQ.
Kind words from users
- The best AI coding assistant so far. -- Matthew Berman
- Hands down, this is the best AI coding assistant tool so far. -- IndyDevDan
- Aider ... has easily quadrupled my coding productivity. -- SOLAR_FIELDS
- It's a cool workflow... Aider's ergonomics are perfect for me. -- qup
- It's really like having your senior developer live right in your Git repo - truly amazing! -- rappster
- What an amazing tool. It's incredible. -- valyagolev
- Aider is such an astounding thing! -- cgrothaus
- It was WAY faster than I would be getting off the ground and making the first few working versions. -- Daniel Feldman
- THANK YOU for Aider! It really feels like a glimpse into the future of coding. -- derwiki
- It's just amazing. It is freeing me to do things I felt were out my comfort zone before. -- Dougie
- This project is stellar. -- funkytaco
- Amazing project, definitely the best AI coding assistant I've used. -- joshuavial
- I absolutely love using Aider ... It makes software development feel so much lighter as an experience. -- principalideal0
- I have been recovering from multiple shoulder surgeries ... and have used aider extensively. It has allowed me to continue productivity. -- codeninja
- I am an aider addict. I'm getting so much more work done, but in less time. -- dandandan
- After wasting $100 on tokens trying to find something better, I'm back to Aider. It blows everything else out of the water hands down, there's no competition whatsoever. -- SystemSculpt
- Best agent for actual dev work in existing codebases. -- Nick Dobos