moved website/ -> aider/website/

This commit is contained in:
Paul Gauthier 2024-07-05 10:01:30 -03:00
parent eb80b32915
commit 22a494bb59
155 changed files with 9 additions and 9 deletions

8
aider/website/Gemfile Normal file
View file

@ -0,0 +1,8 @@
source 'https://rubygems.org'
gem 'jekyll'
gem "just-the-docs", "0.8.2"
gem 'jekyll-redirect-from'
gem 'jekyll-sitemap'
gem "webrick"
gem 'github-pages', group: :jekyll_plugins
gem "html-proofer"

466
aider/website/HISTORY.md Normal file
View file

@ -0,0 +1,466 @@
---
title: Release history
parent: More info
nav_order: 999
---
<!--[[[cog
# This page is a copy of HISTORY.md, adding the front matter above.
text = open("HISTORY.md").read()
cog.out(text)
]]]-->
# Release history
### Aider v0.42.0
- Performance release:
- 5X faster launch!
- Faster auto-complete in large git repos (users report ~100X speedup)!
### Aider v0.41.0
- [Allow Claude 3.5 Sonnet to stream back >4k tokens!](https://aider.chat/2024/07/01/sonnet-not-lazy.html)
- It is the first model capable of writing such large coherent, useful code edits.
- Do large refactors or generate multiple files of new code in one go.
- Aider now uses `claude-3-5-sonnet-20240620` by default if `ANTHROPIC_API_KEY` is set in the environment.
- [Enabled image support](https://aider.chat/docs/images-urls.html) for 3.5 Sonnet and for GPT-4o & 3.5 Sonnet via OpenRouter (by @yamitzky).
- Added `--attribute-commit-message` to prefix aider's commit messages with "aider:".
- Fixed regression in quality of one-line commit messages.
- Automatically retry on Anthropic `overloaded_error`.
- Bumped dependency versions.
### Aider v0.40.6
- Fixed `/undo` so it works regardless of `--attribute` settings.
### Aider v0.40.5
- Bump versions to pickup latest litellm to fix streaming issue with Gemini
- https://github.com/BerriAI/litellm/issues/4408
### Aider v0.40.1
- Improved context awareness of repomap.
- Restored proper `--help` functionality.
### Aider v0.40.0
- Improved prompting to discourage Sonnet from wasting tokens emitting unchanging code (#705).
- Improved error info for token limit errors.
- Options to suppress adding "(aider)" to the [git author and committer names](https://aider.chat/docs/git.html#commit-attribution).
- Use `--model-settings-file` to customize per-model settings, like use of repo-map (by @caseymcc).
- Improved invocation of flake8 linter for python code.
### Aider v0.39.0
- Use `--sonnet` for Claude 3.5 Sonnet, which is the top model on [aider's LLM code editing leaderboard](https://aider.chat/docs/leaderboards/#claude-35-sonnet-takes-the-top-spot).
- All `AIDER_xxx` environment variables can now be set in `.env` (by @jpshack-at-palomar).
- Use `--llm-history-file` to log raw messages sent to the LLM (by @daniel-vainsencher).
- Commit messages are no longer prefixed with "aider:". Instead the git author and committer names have "(aider)" added.
### Aider v0.38.0
- Use `--vim` for [vim keybindings](https://aider.chat/docs/commands.html#vi) in the chat.
- [Add LLM metadata](https://aider.chat/docs/llms/warnings.html#specifying-context-window-size-and-token-costs) via `.aider.models.json` file (by @caseymcc).
- More detailed [error messages on token limit errors](https://aider.chat/docs/troubleshooting/token-limits.html).
- Single line commit messages, without the recent chat messages.
- Ensure `--commit --dry-run` does nothing.
- Have playwright wait for idle network to better scrape js sites.
- Documentation updates, moved into website/ subdir.
- Moved tests/ into aider/tests/.
### Aider v0.37.0
- Repo map is now optimized based on text of chat history as well as files added to chat.
- Improved prompts when no files have been added to chat to solicit LLM file suggestions.
- Aider will notice if you paste a URL into the chat, and offer to scrape it.
- Performance improvements the repo map, especially in large repos.
- Aider will not offer to add bare filenames like `make` or `run` which may just be words.
- Properly override `GIT_EDITOR` env for commits if it is already set.
- Detect supported audio sample rates for `/voice`.
- Other small bug fixes.
### Aider v0.36.0
- [Aider can now lint your code and fix any errors](https://aider.chat/2024/05/22/linting.html).
- Aider automatically lints and fixes after every LLM edit.
- You can manually lint-and-fix files with `/lint` in the chat or `--lint` on the command line.
- Aider includes built in basic linters for all supported tree-sitter languages.
- You can also configure aider to use your preferred linter with `--lint-cmd`.
- Aider has additional support for running tests and fixing problems.
- Configure your testing command with `--test-cmd`.
- Run tests with `/test` or from the command line with `--test`.
- Aider will automatically attempt to fix any test failures.
### Aider v0.35.0
- Aider now uses GPT-4o by default.
- GPT-4o tops the [aider LLM code editing leaderboard](https://aider.chat/docs/leaderboards/) at 72.9%, versus 68.4% for Opus.
- GPT-4o takes second on [aider's refactoring leaderboard](https://aider.chat/docs/leaderboards/#code-refactoring-leaderboard) with 62.9%, versus Opus at 72.3%.
- Added `--restore-chat-history` to restore prior chat history on launch, so you can continue the last conversation.
- Improved reflection feedback to LLMs using the diff edit format.
- Improved retries on `httpx` errors.
### Aider v0.34.0
- Updated prompting to use more natural phrasing about files, the git repo, etc. Removed reliance on read-write/read-only terminology.
- Refactored prompting to unify some phrasing across edit formats.
- Enhanced the canned assistant responses used in prompts.
- Added explicit model settings for `openrouter/anthropic/claude-3-opus`, `gpt-3.5-turbo`
- Added `--show-prompts` debug switch.
- Bugfix: catch and retry on all litellm exceptions.
### Aider v0.33.0
- Added native support for [Deepseek models](https://aider.chat/docs/llms.html#deepseek) using `DEEPSEEK_API_KEY` and `deepseek/deepseek-chat`, etc rather than as a generic OpenAI compatible API.
### Aider v0.32.0
- [Aider LLM code editing leaderboards](https://aider.chat/docs/leaderboards/) that rank popular models according to their ability to edit code.
- Leaderboards include GPT-3.5/4 Turbo, Opus, Sonnet, Gemini 1.5 Pro, Llama 3, Deepseek Coder & Command-R+.
- Gemini 1.5 Pro now defaults to a new diff-style edit format (diff-fenced), enabling it to work better with larger code bases.
- Support for Deepseek-V2, via more a flexible config of system messages in the diff edit format.
- Improved retry handling on errors from model APIs.
- Benchmark outputs results in YAML, compatible with leaderboard.
### Aider v0.31.0
- [Aider is now also AI pair programming in your browser!](https://aider.chat/2024/05/02/browser.html) Use the `--browser` switch to launch an experimental browser based version of aider.
- Switch models during the chat with `/model <name>` and search the list of available models with `/models <query>`.
### Aider v0.30.1
- Adding missing `google-generativeai` dependency
### Aider v0.30.0
- Added [Gemini 1.5 Pro](https://aider.chat/docs/llms.html#free-models) as a recommended free model.
- Allow repo map for "whole" edit format.
- Added `--models <MODEL-NAME>` to search the available models.
- Added `--no-show-model-warnings` to silence model warnings.
### Aider v0.29.2
- Improved [model warnings](https://aider.chat/docs/llms.html#model-warnings) for unknown or unfamiliar models
### Aider v0.29.1
- Added better support for groq/llama3-70b-8192
### Aider v0.29.0
- Added support for [directly connecting to Anthropic, Cohere, Gemini and many other LLM providers](https://aider.chat/docs/llms.html).
- Added `--weak-model <model-name>` which allows you to specify which model to use for commit messages and chat history summarization.
- New command line switches for working with popular models:
- `--4-turbo-vision`
- `--opus`
- `--sonnet`
- `--anthropic-api-key`
- Improved "whole" and "diff" backends to better support [Cohere's free to use Command-R+ model](https://aider.chat/docs/llms.html#cohere).
- Allow `/add` of images from anywhere in the filesystem.
- Fixed crash when operating in a repo in a detached HEAD state.
- Fix: Use the same default model in CLI and python scripting.
### Aider v0.28.0
- Added support for new `gpt-4-turbo-2024-04-09` and `gpt-4-turbo` models.
- Benchmarked at 61.7% on Exercism benchmark, comparable to `gpt-4-0613` and worse than the `gpt-4-preview-XXXX` models. See [recent Exercism benchmark results](https://aider.chat/2024/03/08/claude-3.html).
- Benchmarked at 34.1% on the refactoring/laziness benchmark, significantly worse than the `gpt-4-preview-XXXX` models. See [recent refactor bencmark results](https://aider.chat/2024/01/25/benchmarks-0125.html).
- Aider continues to default to `gpt-4-1106-preview` as it performs best on both benchmarks, and significantly better on the refactoring/laziness benchmark.
### Aider v0.27.0
- Improved repomap support for typescript, by @ryanfreckleton.
- Bugfix: Only /undo the files which were part of the last commit, don't stomp other dirty files
- Bugfix: Show clear error message when OpenAI API key is not set.
- Bugfix: Catch error for obscure languages without tags.scm file.
### Aider v0.26.1
- Fixed bug affecting parsing of git config in some environments.
### Aider v0.26.0
- Use GPT-4 Turbo by default.
- Added `-3` and `-4` switches to use GPT 3.5 or GPT-4 (non-Turbo).
- Bug fix to avoid reflecting local git errors back to GPT.
- Improved logic for opening git repo on launch.
### Aider v0.25.0
- Issue a warning if user adds too much code to the chat.
- https://aider.chat/docs/faq.html#how-can-i-add-all-the-files-to-the-chat
- Vocally refuse to add files to the chat that match `.aiderignore`
- Prevents bug where subsequent git commit of those files will fail.
- Added `--openai-organization-id` argument.
- Show the user a FAQ link if edits fail to apply.
- Made past articles part of https://aider.chat/blog/
### Aider v0.24.1
- Fixed bug with cost computations when --no-steam in effect
### Aider v0.24.0
- New `/web <url>` command which scrapes the url, turns it into fairly clean markdown and adds it to the chat.
- Updated all OpenAI model names, pricing info
- Default GPT 3.5 model is now `gpt-3.5-turbo-0125`.
- Bugfix to the `!` alias for `/run`.
### Aider v0.23.0
- Added support for `--model gpt-4-0125-preview` and OpenAI's alias `--model gpt-4-turbo-preview`. The `--4turbo` switch remains an alias for `--model gpt-4-1106-preview` at this time.
- New `/test` command that runs a command and adds the output to the chat on non-zero exit status.
- Improved streaming of markdown to the terminal.
- Added `/quit` as alias for `/exit`.
- Added `--skip-check-update` to skip checking for the update on launch.
- Added `--openrouter` as a shortcut for `--openai-api-base https://openrouter.ai/api/v1`
- Fixed bug preventing use of env vars `OPENAI_API_BASE, OPENAI_API_TYPE, OPENAI_API_VERSION, OPENAI_API_DEPLOYMENT_ID`.
### Aider v0.22.0
- Improvements for unified diff editing format.
- Added ! as an alias for /run.
- Autocomplete for /add and /drop now properly quotes filenames with spaces.
- The /undo command asks GPT not to just retry reverted edit.
### Aider v0.21.1
- Bugfix for unified diff editing format.
- Added --4turbo and --4 aliases for --4-turbo.
### Aider v0.21.0
- Support for python 3.12.
- Improvements to unified diff editing format.
- New `--check-update` arg to check if updates are available and exit with status code.
### Aider v0.20.0
- Add images to the chat to automatically use GPT-4 Vision, by @joshuavial
- Bugfixes:
- Improved unicode encoding for `/run` command output, by @ctoth
- Prevent false auto-commits on Windows, by @ctoth
### Aider v0.19.1
- Removed stray debug output.
### Aider v0.19.0
- [Significantly reduced "lazy" coding from GPT-4 Turbo due to new unified diff edit format](https://aider.chat/docs/unified-diffs.html)
- Score improves from 20% to 61% on new "laziness benchmark".
- Aider now uses unified diffs by default for `gpt-4-1106-preview`.
- New `--4-turbo` command line switch as a shortcut for `--model gpt-4-1106-preview`.
### Aider v0.18.1
- Upgraded to new openai python client v1.3.7.
### Aider v0.18.0
- Improved prompting for both GPT-4 and GPT-4 Turbo.
- Far fewer edit errors from GPT-4 Turbo (`gpt-4-1106-preview`).
- Significantly better benchmark results from the June GPT-4 (`gpt-4-0613`). Performance leaps from 47%/64% up to 51%/71%.
- Fixed bug where in-chat files were marked as both read-only and ready-write, sometimes confusing GPT.
- Fixed bug to properly handle repos with submodules.
### Aider v0.17.0
- Support for OpenAI's new 11/06 models:
- gpt-4-1106-preview with 128k context window
- gpt-3.5-turbo-1106 with 16k context window
- [Benchmarks for OpenAI's new 11/06 models](https://aider.chat/docs/benchmarks-1106.html)
- Streamlined [API for scripting aider, added docs](https://aider.chat/docs/faq.html#can-i-script-aider)
- Ask for more concise SEARCH/REPLACE blocks. [Benchmarked](https://aider.chat/docs/benchmarks.html) at 63.9%, no regression.
- Improved repo-map support for elisp.
- Fixed crash bug when `/add` used on file matching `.gitignore`
- Fixed misc bugs to catch and handle unicode decoding errors.
### Aider v0.16.3
- Fixed repo-map support for C#.
### Aider v0.16.2
- Fixed docker image.
### Aider v0.16.1
- Updated tree-sitter dependencies to streamline the pip install process
### Aider v0.16.0
- [Improved repository map using tree-sitter](https://aider.chat/docs/repomap.html)
- Switched from "edit block" to "search/replace block", which reduced malformed edit blocks. [Benchmarked](https://aider.chat/docs/benchmarks.html) at 66.2%, no regression.
- Improved handling of malformed edit blocks targeting multiple edits to the same file. [Benchmarked](https://aider.chat/docs/benchmarks.html) at 65.4%, no regression.
- Bugfix to properly handle malformed `/add` wildcards.
### Aider v0.15.0
- Added support for `.aiderignore` file, which instructs aider to ignore parts of the git repo.
- New `--commit` cmd line arg, which just commits all pending changes with a sensible commit message generated by gpt-3.5.
- Added universal ctags and multiple architectures to the [aider docker image](https://aider.chat/docs/install/docker.html)
- `/run` and `/git` now accept full shell commands, like: `/run (cd subdir; ls)`
- Restored missing `--encoding` cmd line switch.
### Aider v0.14.2
- Easily [run aider from a docker image](https://aider.chat/docs/install/docker.html)
- Fixed bug with chat history summarization.
- Fixed bug if `soundfile` package not available.
### Aider v0.14.1
- /add and /drop handle absolute filenames and quoted filenames
- /add checks to be sure files are within the git repo (or root)
- If needed, warn users that in-chat file paths are all relative to the git repo
- Fixed /add bug in when aider launched in repo subdir
- Show models supported by api/key if requested model isn't available
### Aider v0.14.0
- [Support for Claude2 and other LLMs via OpenRouter](https://aider.chat/docs/faq.html#accessing-other-llms-with-openrouter) by @joshuavial
- Documentation for [running the aider benchmarking suite](https://github.com/paul-gauthier/aider/tree/main/benchmark)
- Aider now requires Python >= 3.9
### Aider v0.13.0
- [Only git commit dirty files that GPT tries to edit](https://aider.chat/docs/faq.html#how-did-v0130-change-git-usage)
- Send chat history as prompt/context for Whisper voice transcription
- Added `--voice-language` switch to constrain `/voice` to transcribe to a specific language
- Late-bind importing `sounddevice`, as it was slowing down aider startup
- Improved --foo/--no-foo switch handling for command line and yml config settings
### Aider v0.12.0
- [Voice-to-code](https://aider.chat/docs/voice.html) support, which allows you to code with your voice.
- Fixed bug where /diff was causing crash.
- Improved prompting for gpt-4, refactor of editblock coder.
- [Benchmarked](https://aider.chat/docs/benchmarks.html) at 63.2% for gpt-4/diff, no regression.
### Aider v0.11.1
- Added a progress bar when initially creating a repo map.
- Fixed bad commit message when adding new file to empty repo.
- Fixed corner case of pending chat history summarization when dirty committing.
- Fixed corner case of undefined `text` when using `--no-pretty`.
- Fixed /commit bug from repo refactor, added test coverage.
- [Benchmarked](https://aider.chat/docs/benchmarks.html) at 53.4% for gpt-3.5/whole (no regression).
### Aider v0.11.0
- Automatically summarize chat history to avoid exhausting context window.
- More detail on dollar costs when running with `--no-stream`
- Stronger GPT-3.5 prompt against skipping/eliding code in replies (51.9% [benchmark](https://aider.chat/docs/benchmarks.html), no regression)
- Defend against GPT-3.5 or non-OpenAI models suggesting filenames surrounded by asterisks.
- Refactored GitRepo code out of the Coder class.
### Aider v0.10.1
- /add and /drop always use paths relative to the git root
- Encourage GPT to use language like "add files to the chat" to ask users for permission to edit them.
### Aider v0.10.0
- Added `/git` command to run git from inside aider chats.
- Use Meta-ENTER (Esc+ENTER in some environments) to enter multiline chat messages.
- Create a `.gitignore` with `.aider*` to prevent users from accidentaly adding aider files to git.
- Check pypi for newer versions and notify user.
- Updated keyboard interrupt logic so that 2 ^C in 2 seconds always forces aider to exit.
- Provide GPT with detailed error if it makes a bad edit block, ask for a retry.
- Force `--no-pretty` if aider detects it is running inside a VSCode terminal.
- [Benchmarked](https://aider.chat/docs/benchmarks.html) at 64.7% for gpt-4/diff (no regression)
### Aider v0.9.0
- Support for the OpenAI models in [Azure](https://aider.chat/docs/faq.html#azure)
- Added `--show-repo-map`
- Improved output when retrying connections to the OpenAI API
- Redacted api key from `--verbose` output
- Bugfix: recognize and add files in subdirectories mentioned by user or GPT
- [Benchmarked](https://aider.chat/docs/benchmarks.html) at 53.8% for gpt-3.5-turbo/whole (no regression)
### Aider v0.8.3
- Added `--dark-mode` and `--light-mode` to select colors optimized for terminal background
- Install docs link to [NeoVim plugin](https://github.com/joshuavial/aider.nvim) by @joshuavial
- Reorganized the `--help` output
- Bugfix/improvement to whole edit format, may improve coding editing for GPT-3.5
- Bugfix and tests around git filenames with unicode characters
- Bugfix so that aider throws an exception when OpenAI returns InvalidRequest
- Bugfix/improvement to /add and /drop to recurse selected directories
- Bugfix for live diff output when using "whole" edit format
### Aider v0.8.2
- Disabled general availability of gpt-4 (it's rolling out, not 100% available yet)
### Aider v0.8.1
- Ask to create a git repo if none found, to better track GPT's code changes
- Glob wildcards are now supported in `/add` and `/drop` commands
- Pass `--encoding` into ctags, require it to return `utf-8`
- More robust handling of filepaths, to avoid 8.3 windows filenames
- Added [FAQ](https://aider.chat/docs/faq.html)
- Marked GPT-4 as generally available
- Bugfix for live diffs of whole coder with missing filenames
- Bugfix for chats with multiple files
- Bugfix in editblock coder prompt
### Aider v0.8.0
- [Benchmark comparing code editing in GPT-3.5 and GPT-4](https://aider.chat/docs/benchmarks.html)
- Improved Windows support:
- Fixed bugs related to path separators in Windows
- Added a CI step to run all tests on Windows
- Improved handling of Unicode encoding/decoding
- Explicitly read/write text files with utf-8 encoding by default (mainly benefits Windows)
- Added `--encoding` switch to specify another encoding
- Gracefully handle decoding errors
- Added `--code-theme` switch to control the pygments styling of code blocks (by @kwmiebach)
- Better status messages explaining the reason when ctags is disabled
### Aider v0.7.2:
- Fixed a bug to allow aider to edit files that contain triple backtick fences.
### Aider v0.7.1:
- Fixed a bug in the display of streaming diffs in GPT-3.5 chats
### Aider v0.7.0:
- Graceful handling of context window exhaustion, including helpful tips.
- Added `--message` to give GPT that one instruction and then exit after it replies and any edits are performed.
- Added `--no-stream` to disable streaming GPT responses.
- Non-streaming responses include token usage info.
- Enables display of cost info based on OpenAI advertised pricing.
- Coding competence benchmarking tool against suite of programming tasks based on Execism's python repo.
- https://github.com/exercism/python
- Major refactor in preparation for supporting new function calls api.
- Initial implementation of a function based code editing backend for 3.5.
- Initial experiments show that using functions makes 3.5 less competent at coding.
- Limit automatic retries when GPT returns a malformed edit response.
### Aider v0.6.2
* Support for `gpt-3.5-turbo-16k`, and all OpenAI chat models
* Improved ability to correct when gpt-4 omits leading whitespace in code edits
* Added `--openai-api-base` to support API proxies, etc.
### Aider v0.5.0
- Added support for `gpt-3.5-turbo` and `gpt-4-32k`.
- Added `--map-tokens` to set a token budget for the repo map, along with a PageRank based algorithm for prioritizing which files and identifiers to include in the map.
- Added in-chat command `/tokens` to report on context window token usage.
- Added in-chat command `/clear` to clear the conversation history.
<!--[[[end]]]-->

39
aider/website/_config.yml Normal file
View file

@ -0,0 +1,39 @@
theme: just-the-docs
url: "https://aider.chat"
plugins:
- jekyll-redirect-from
- jekyll-sitemap
- jekyll-feed
defaults:
- scope:
path: "README.md"
type: "pages"
values:
description: "aider is AI pair programming in your terminal"
exclude:
- "tmp*"
- "**/tmp*"
- OLD
- "**/OLD/**"
- "OLD/**"
- vendor
- feed.xml
aux_links:
"GitHub":
- "https://github.com/paul-gauthier/aider"
"Discord":
- "https://discord.gg/Tv2uQnR88V"
"Blog":
- "/blog/"
nav_external_links:
- title: "GitHub"
url: "https://github.com/paul-gauthier/aider"
- title: "Discord"
url: "https://discord.gg/Tv2uQnR88V"
repository: paul-gauthier/aider

View file

@ -0,0 +1,684 @@
- dirname: 2024-05-01-20-05-59--direct-opus-filenames-outside-fence
test_cases: 133
model: claude-3-opus-20240229
released: 2024-02-29
edit_format: diff
commit_hash: f4b1797-dirty, f4b1797
pass_rate_1: 53.4
pass_rate_2: 68.4
percent_cases_well_formed: 100.0
error_outputs: 2
num_malformed_responses: 0
user_asks: 0
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 1
command: aider --opus
date: 2024-05-01
versions: 0.30.2-dev
seconds_per_case: 32.4
total_cost: 13.8395
- dirname: 2024-03-06-16-42-00--claude3-sonnet-whole
test_cases: 133
model: claude-3-sonnet-20240229
released: 2024-02-29
edit_format: whole
commit_hash: a5f8076-dirty
pass_rate_1: 43.6
pass_rate_2: 54.9
percent_cases_well_formed: 100.0
error_outputs: 1
num_malformed_responses: 0
user_asks: 1
lazy_comments: 1
syntax_errors: 2
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 7
command: aider --sonnet
date: 2024-03-06
versions: 0.25.1-dev
seconds_per_case: 23.1
total_cost: 0.0000
- dirname: 2024-05-03-20-47-24--gemini-1.5-pro-diff-fenced
test_cases: 133
model: gemini-1.5-pro-latest
edit_format: diff-fenced
commit_hash: 3a48dfb, 5d32dd7
pass_rate_1: 45.9
pass_rate_2: 57.1
percent_cases_well_formed: 87.2
error_outputs: 60
num_malformed_responses: 17
user_asks: 3
lazy_comments: 0
syntax_errors: 8
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 3
command: aider --model gemini/gemini-1.5-pro-latest
date: 2024-05-03
versions: 0.31.2-dev
seconds_per_case: 21.3
total_cost: 0.0000
- dirname: 2024-05-08-20-59-15--may-gpt-3.5-turbo-whole
test_cases: 133
model: gpt-3.5-turbo-0125
released: 2024-01-25
edit_format: whole
commit_hash: 1d55f74
pass_rate_1: 41.4
pass_rate_2: 50.4
percent_cases_well_formed: 100.0
error_outputs: 0
num_malformed_responses: 0
user_asks: 0
lazy_comments: 0
syntax_errors: 3
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 4
command: aider -3
date: 2024-05-08
versions: 0.33.1-dev
seconds_per_case: 6.5
total_cost: 0.5032
- dirname: 2023-11-06-21-23-59--gpt-3.5-turbo-0301
test_cases: 133
model: gpt-3.5-turbo-0301
released: 2023-03-01
edit_format: whole
commit_hash: 44388db-dirty
pass_rate_1: 50.4
pass_rate_2: 57.9
percent_cases_well_formed: 100.0
error_outputs: 1
num_malformed_responses: 0
user_asks: 1
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 8
command: aider --model gpt-3.5-turbo-0301
date: 2023-11-06
versions: 0.16.4-dev
seconds_per_case: 6.5
total_cost: 0.4822
- dirname: 2023-11-07-02-41-07--gpt-3.5-turbo-0613
test_cases: 133
model: gpt-3.5-turbo-0613
released: 2023-06-13
edit_format: whole
commit_hash: 93aa497-dirty
pass_rate_1: 38.3
pass_rate_2: 50.4
percent_cases_well_formed: 100.0
error_outputs: 1
num_malformed_responses: 0
user_asks: 1
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 5
command: aider --model gpt-3.5-turbo-0613
date: 2023-11-07
versions: 0.16.4-dev
seconds_per_case: 18.0
total_cost: 0.5366
- dirname: 2024-04-30-21-40-51--litellm-gpt-3.5-turbo-1106-again
test_cases: 132
model: gpt-3.5-turbo-1106
edit_format: whole
commit_hash: 7b14d77
pass_rate_1: 45.5
pass_rate_2: 56.1
percent_cases_well_formed: 100.0
error_outputs: 1
num_malformed_responses: 0
user_asks: 1
lazy_comments: 0
syntax_errors: 19
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model gpt-3.5-turbo-1106
date: 2024-04-30
versions: 0.30.2-dev
seconds_per_case: 5.3
total_cost: 0.3261
- dirname: 2024-01-25-23-37-15--jan-exercism-gpt-4-0125-preview-udiff
test_cases: 133
model: gpt-4-0125-preview
released: 2024-01-25
edit_format: udiff
commit_hash: edcf9b1
pass_rate_1: 55.6
pass_rate_2: 66.2
percent_cases_well_formed: 97.7
error_outputs: 6
num_malformed_responses: 3
user_asks: 0
lazy_comments: 0
syntax_errors: 3
indentation_errors: 7
exhausted_context_windows: 0
test_timeouts: 4
command: aider --model gpt-4-0125-preview
date: 2024-01-25
versions: 0.22.1-dev
seconds_per_case: 44.8
total_cost: 14.6428
- dirname: 2024-05-04-15-07-30--redo-gpt-4-0314-diff-reminder-rules
test_cases: 133
model: gpt-4-0314
released: 2023-03-14
edit_format: diff
commit_hash: 0d43468
pass_rate_1: 50.4
pass_rate_2: 66.2
percent_cases_well_formed: 93.2
error_outputs: 28
num_malformed_responses: 9
user_asks: 1
lazy_comments: 3
syntax_errors: 9
indentation_errors: 7
exhausted_context_windows: 0
test_timeouts: 3
command: aider --model gpt-4-0314
date: 2024-05-04
versions: 0.31.2-dev
seconds_per_case: 19.8
total_cost: 16.2689
- dirname: 2023-12-16-21-24-28--editblock-gpt-4-0613-actual-main
test_cases: 133
model: gpt-4-0613
released: 2023-06-13
edit_format: diff
commit_hash: 3aa17c4
pass_rate_1: 46.6
pass_rate_2: 67.7
percent_cases_well_formed: 100.0
error_outputs: 14
num_malformed_responses: 0
user_asks: 0
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 2
command: aider -4
date: 2023-12-16
versions: 0.18.2-dev
seconds_per_case: 33.6
total_cost: 17.4657
- dirname: 2024-05-08-21-16-03--may-gpt-4-1106-preview-udiff
test_cases: 133
model: gpt-4-1106-preview
released: 2023-11-06
edit_format: udiff
commit_hash: 87664dc
pass_rate_1: 51.9
pass_rate_2: 65.4
percent_cases_well_formed: 92.5
error_outputs: 30
num_malformed_responses: 10
user_asks: 0
lazy_comments: 3
syntax_errors: 11
indentation_errors: 2
exhausted_context_windows: 0
test_timeouts: 1
command: aider --model gpt-4-1106-preview
date: 2024-05-08
versions: 0.33.1-dev
seconds_per_case: 20.4
total_cost: 6.6061
- dirname: 2024-05-01-02-09-20--gpt-4-turbo-examples
test_cases: 133
model: gpt-4-turbo-2024-04-09 (udiff)
released: 2024-04-09
edit_format: udiff
commit_hash: e610e5b-dirty
pass_rate_1: 48.1
pass_rate_2: 63.9
percent_cases_well_formed: 97.0
error_outputs: 12
num_malformed_responses: 4
user_asks: 0
lazy_comments: 0
syntax_errors: 4
indentation_errors: 2
exhausted_context_windows: 0
test_timeouts: 3
command: aider --gpt-4-turbo
date: 2024-05-01
versions: 0.30.2-dev
seconds_per_case: 22.8
total_cost: 6.3337
- dirname: 2024-05-03-22-24-48--openrouter--llama3-diff-examples-sys-msg
test_cases: 132
model: llama3-70b-8192
released: 2024-04-18
edit_format: diff
commit_hash: b5bb453
pass_rate_1: 38.6
pass_rate_2: 49.2
percent_cases_well_formed: 73.5
error_outputs: 105
num_malformed_responses: 35
user_asks: 0
lazy_comments: 0
syntax_errors: 1
indentation_errors: 2
exhausted_context_windows: 0
test_timeouts: 3
command: aider --model groq/llama3-70b-8192
date: 2024-05-03
versions: 0.31.2-dev
seconds_per_case: 14.5
total_cost: 0.4311
- dirname: 2024-05-06-18-31-08--command-r-plus-whole-final
test_cases: 133
model: command-r-plus
released: 2024-04-04
edit_format: whole
commit_hash: fc3a43e-dirty
pass_rate_1: 21.8
pass_rate_2: 31.6
percent_cases_well_formed: 100.0
error_outputs: 0
num_malformed_responses: 0
user_asks: 0
lazy_comments: 1
syntax_errors: 5
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 7
command: aider --model command-r-plus
date: 2024-05-06
versions: 0.31.2-dev
seconds_per_case: 22.9
total_cost: 2.7494
- dirname: 2024-05-07-12-55-06--deepseek-chat-v2-whole
test_cases: 133
model: deepseek-chat v2 (whole)
edit_format: whole
commit_hash: b1cae73, db994fb
pass_rate_1: 50.4
pass_rate_2: 60.2
percent_cases_well_formed: 100.0
error_outputs: 3
num_malformed_responses: 0
user_asks: 3
lazy_comments: 13
syntax_errors: 0
indentation_errors: 2
exhausted_context_windows: 0
test_timeouts: 1
command: aider --model deepseek/deepseek-chat --edit-format whole
date: 2024-05-07
versions: 0.31.2-dev
seconds_per_case: 42.4
total_cost: 0.0000
- dirname: 2024-05-09-18-57-52--deepseek-chat-v2-diff-reverted-and-helpful-assistant2
test_cases: 133
model: deepseek-chat v2 (diff)
released: 2024-05-06
edit_format: diff
commit_hash: 80a3f6d
pass_rate_1: 44.4
pass_rate_2: 60.9
percent_cases_well_formed: 97.0
error_outputs: 14
num_malformed_responses: 4
user_asks: 2
lazy_comments: 0
syntax_errors: 13
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 3
command: aider --model deepseek/deepseek-chat
date: 2024-05-09
versions: 0.33.1-dev
seconds_per_case: 86.8
total_cost: 0.0941
- dirname: 2024-05-07-20-32-37--qwen1.5-110b-chat-whole
test_cases: 133
model: qwen1.5-110b-chat
released: 2024-02-04
edit_format: whole
commit_hash: 70b1c0c
pass_rate_1: 30.8
pass_rate_2: 37.6
percent_cases_well_formed: 100.0
error_outputs: 3
num_malformed_responses: 0
user_asks: 3
lazy_comments: 20
syntax_errors: 0
indentation_errors: 6
exhausted_context_windows: 0
test_timeouts: 3
command: aider --model together_ai/qwen/qwen1.5-110b-chat
date: 2024-05-07
versions: 0.31.2-dev
seconds_per_case: 46.9
total_cost: 0.0000
- dirname: 2024-05-07-20-57-04--wizardlm-2-8x22b-whole
test_cases: 133
model: WizardLM-2 8x22B
edit_format: whole
commit_hash: 8e272bf, bbe8639
pass_rate_1: 27.8
pass_rate_2: 44.4
percent_cases_well_formed: 100.0
error_outputs: 0
num_malformed_responses: 0
user_asks: 0
lazy_comments: 1
syntax_errors: 2
indentation_errors: 2
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model openrouter/microsoft/wizardlm-2-8x22b
date: 2024-05-07
versions: 0.31.2-dev
seconds_per_case: 36.6
total_cost: 0.0000
- dirname: 2024-05-13-17-39-05--gpt-4o-diff
test_cases: 133
model: gpt-4o
released: 2024-05-13
edit_format: diff
commit_hash: b6cd852
pass_rate_1: 60.2
pass_rate_2: 72.9
percent_cases_well_formed: 96.2
error_outputs: 103
num_malformed_responses: 5
user_asks: 0
lazy_comments: 0
syntax_errors: 0
indentation_errors: 2
exhausted_context_windows: 0
test_timeouts: 1
command: aider
date: 2024-05-13
versions: 0.34.1-dev
seconds_per_case: 6.0
total_cost: 0.0000
- dirname: 2024-04-12-22-18-20--gpt-4-turbo-2024-04-09-plain-diff
test_cases: 33
model: gpt-4-turbo-2024-04-09 (diff)
edit_format: diff
commit_hash: 9b2e697-dirty
pass_rate_1: 48.5
pass_rate_2: 57.6
percent_cases_well_formed: 100.0
error_outputs: 15
num_malformed_responses: 0
user_asks: 15
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model gpt-4-turbo-2024-04-09
date: 2024-04-12
versions: 0.28.1-dev
seconds_per_case: 17.6
total_cost: 1.6205
- dirname: 2024-06-08-22-37-55--qwen2-72b-instruct-whole
test_cases: 133
model: Qwen2 72B Instruct
edit_format: whole
commit_hash: 02c7335-dirty, 1a97498-dirty
pass_rate_1: 44.4
pass_rate_2: 55.6
percent_cases_well_formed: 100.0
error_outputs: 3
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 3
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 1
command: aider --model together_ai/qwen/Qwen2-72B-Instruct
date: 2024-06-08
versions: 0.37.1-dev
seconds_per_case: 14.3
total_cost: 0.0000
- dirname: 2024-06-08-23-45-41--gemini-1.5-flash-latest-whole
test_cases: 133
model: gemini-1.5-flash-latest
edit_format: whole
commit_hash: 86ea47f-dirty
pass_rate_1: 33.8
pass_rate_2: 44.4
percent_cases_well_formed: 100.0
error_outputs: 16
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 12
lazy_comments: 0
syntax_errors: 9
indentation_errors: 1
exhausted_context_windows: 0
test_timeouts: 3
command: aider --model gemini/gemini-1.5-flash-latest
date: 2024-06-08
versions: 0.37.1-dev
seconds_per_case: 7.2
total_cost: 0.0000
- dirname: 2024-06-09-03-28-21--codestral-whole
test_cases: 133
model: codestral-2405
edit_format: whole
commit_hash: effc88a
pass_rate_1: 35.3
pass_rate_2: 51.1
percent_cases_well_formed: 100.0
error_outputs: 4
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 4
lazy_comments: 1
syntax_errors: 0
indentation_errors: 1
exhausted_context_windows: 0
test_timeouts: 4
command: aider --model mistral/codestral-2405
date: 2024-06-09
versions: 0.37.1-dev
seconds_per_case: 7.5
total_cost: 0.6805
- dirname: 2024-06-08-19-25-26--codeqwen:7b-chat-v1.5-q8_0-whole
test_cases: 133
model: codeqwen:7b-chat-v1.5-q8_0
edit_format: whole
commit_hash: be0520f-dirty
pass_rate_1: 32.3
pass_rate_2: 34.6
percent_cases_well_formed: 100.0
error_outputs: 8
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 8
lazy_comments: 0
syntax_errors: 1
indentation_errors: 2
exhausted_context_windows: 0
test_timeouts: 1
command: aider --model ollama/codeqwen:7b-chat-v1.5-q8_0
date: 2024-06-08
versions: 0.37.1-dev
seconds_per_case: 15.6
total_cost: 0.0000
- dirname: 2024-06-08-16-12-31--codestral:22b-v0.1-q8_0-whole
test_cases: 133
model: codestral:22b-v0.1-q8_0
edit_format: whole
commit_hash: be0520f-dirty
pass_rate_1: 35.3
pass_rate_2: 48.1
percent_cases_well_formed: 100.0
error_outputs: 8
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 8
lazy_comments: 2
syntax_errors: 0
indentation_errors: 1
exhausted_context_windows: 0
test_timeouts: 3
command: aider --model ollama/codestral:22b-v0.1-q8_0
date: 2024-06-08
versions: 0.37.1-dev
seconds_per_case: 46.4
total_cost: 0.0000
- dirname: 2024-06-08-17-54-04--qwen2:72b-instruct-q8_0-whole
test_cases: 133
model: qwen2:72b-instruct-q8_0
edit_format: whole
commit_hash: 74e51d5-dirty
pass_rate_1: 43.6
pass_rate_2: 49.6
percent_cases_well_formed: 100.0
error_outputs: 27
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 27
lazy_comments: 0
syntax_errors: 5
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model ollama/qwen2:72b-instruct-q8_0
date: 2024-06-08
versions: 0.37.1-dev
seconds_per_case: 280.6
total_cost: 0.0000
- dirname: 2024-07-04-14-32-08--claude-3.5-sonnet-diff-continue
test_cases: 133
model: claude-3.5-sonnet
edit_format: diff
commit_hash: 35f21b5
pass_rate_1: 57.1
pass_rate_2: 77.4
percent_cases_well_formed: 99.2
error_outputs: 23
num_malformed_responses: 4
num_with_malformed_responses: 1
user_asks: 2
lazy_comments: 0
syntax_errors: 1
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 1
command: aider --sonnet
date: 2024-07-04
versions: 0.42.1-dev
seconds_per_case: 17.6
total_cost: 3.6346
- dirname: 2024-06-17-14-45-54--deepseek-coder2-whole
test_cases: 133
model: DeepSeek Coder V2 (whole)
edit_format: whole
commit_hash: ca8672b
pass_rate_1: 63.9
pass_rate_2: 75.2
percent_cases_well_formed: 100.0
error_outputs: 1
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 1
lazy_comments: 0
syntax_errors: 1
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 7
command: aider --model deepseek/deepseek-coder
date: 2024-06-17
versions: 0.38.1-dev
seconds_per_case: 21.1
total_cost: 0.0537
- dirname: 2024-06-21-15-29-08--deepseek-coder2-diff-again3
test_cases: 133
model: DeepSeek Coder V2 (diff)
edit_format: diff
commit_hash: 515ab3e
pass_rate_1: 58.6
pass_rate_2: 66.2
percent_cases_well_formed: 98.5
error_outputs: 23
num_malformed_responses: 5
num_with_malformed_responses: 2
user_asks: 2
lazy_comments: 0
syntax_errors: 0
indentation_errors: 1
exhausted_context_windows: 0
test_timeouts: 2
command: aider --model deepseek/deepseek-coder
date: 2024-06-21
versions: 0.39.1-dev
seconds_per_case: 30.2
total_cost: 0.0857
- dirname: 2024-07-01-21-41-48--haiku-whole
test_cases: 133
model: claude-3-haiku-20240307
edit_format: whole
commit_hash: 75f506d
pass_rate_1: 40.6
pass_rate_2: 47.4
percent_cases_well_formed: 100.0
error_outputs: 6
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 0
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 2
command: aider --model claude-3-haiku-20240307
date: 2024-07-01
versions: 0.41.1-dev
seconds_per_case: 7.1
total_cost: 0.1946

View file

@ -0,0 +1,190 @@
- dirname: 2024-05-04-23-27-02--refac-gemini
test_cases: 89
model: gemini/gemini-1.5-pro-latest
edit_format: diff-fenced
commit_hash: a0649ba-dirty, 425cb29, 1b35ca2-dirty, 3e4fca2-dirty
pass_rate_1: 49.4
percent_cases_well_formed: 7.9
error_outputs: 247
num_malformed_responses: 82
user_asks: 0
lazy_comments: 4
syntax_errors: 0
indentation_errors: 8
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model gemini/gemini-1.5-pro-latest
date: 2024-05-04
versions: 0.31.2-dev
seconds_per_case: 55.7
total_cost: 0.0000
- dirname: 2024-05-04-17-45-53--refac-opus
test_cases: 83
model: claude-3-opus-20240229
edit_format: diff
commit_hash: b02320b-dirty
pass_rate_1: 72.3
percent_cases_well_formed: 79.5
error_outputs: 51
num_malformed_responses: 17
user_asks: 0
lazy_comments: 2
syntax_errors: 1
indentation_errors: 3
exhausted_context_windows: 0
test_timeouts: 0
command: aider --opus
date: 2024-05-04
versions: 0.31.2-dev
seconds_per_case: 67.8
total_cost: 27.9176
- dirname: 2024-04-09-21-49-54--refac-gpt-4-turbo-2024-04-09
test_cases: 88
model: gpt-4-turbo-2024-04-09 (udiff)
edit_format: udiff
commit_hash: b75fdb9
pass_rate_1: 34.1
percent_cases_well_formed: 30.7
error_outputs: 183
num_malformed_responses: 61
user_asks: 0
lazy_comments: 1
syntax_errors: 3
indentation_errors: 15
exhausted_context_windows: 0
test_timeouts: 0
command: aider --gpt-4-turbo
date: 2024-04-09
versions: 0.27.1-dev
seconds_per_case: 42.4
total_cost: 19.6556
- dirname: 2024-05-08-22-25-41--may-refac-gpt-4-0125-preview-ex-sys
test_cases: 89
model: gpt-4-0125-preview
edit_format: udiff
commit_hash: bf09bd3-dirty
pass_rate_1: 33.7
percent_cases_well_formed: 47.2
error_outputs: 142
num_malformed_responses: 47
user_asks: 0
lazy_comments: 1
syntax_errors: 2
indentation_errors: 16
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model gpt-4-0125-preview
date: 2024-05-08
versions: 0.33.1-dev
seconds_per_case: 56.6
total_cost: 20.3270
- dirname: 2024-05-08-21-24-16--may-refac-gpt-4-1106-preview
test_cases: 89
model: gpt-4-1106-preview
edit_format: udiff
commit_hash: eaa2514-dirty
pass_rate_1: 50.6
percent_cases_well_formed: 39.3
error_outputs: 164
num_malformed_responses: 54
user_asks: 1
lazy_comments: 17
syntax_errors: 0
indentation_errors: 8
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model gpt-4-1106-preview
date: 2024-05-08
versions: 0.33.1-dev
seconds_per_case: 61.8
total_cost: 18.3844
- dirname: 2024-05-13-17-42-22--refac-gpt-4o-diff
test_cases: 89
model: gpt-4o
edit_format: diff
commit_hash: b6cd852
pass_rate_1: 62.9
percent_cases_well_formed: 53.9
error_outputs: 9025
num_malformed_responses: 41
user_asks: 0
lazy_comments: 2
syntax_errors: 0
indentation_errors: 5
exhausted_context_windows: 0
test_timeouts: 0
command: aider
date: 2024-05-13
versions: 0.34.1-dev
seconds_per_case: 27.8
total_cost: 0.0000
- dirname: 2024-04-10-13-26-18--refac-gpt-4-turbo-2024-04-09-diff
test_cases: 88
model: gpt-4-turbo-2024-04-09 (diff)
edit_format: diff
commit_hash: 7875418
pass_rate_1: 21.4
percent_cases_well_formed: 6.8
error_outputs: 247
num_malformed_responses: 82
user_asks: 1
lazy_comments: 2
syntax_errors: 3
indentation_errors: 8
exhausted_context_windows: 0
test_timeouts: 0
command: aider --model gpt-4-turbo-2024-04-09
date: 2024-04-10
versions: 0.28.1-dev
seconds_per_case: 67.8
total_cost: 20.4889
- dirname: 2024-07-01-18-30-33--refac-claude-3.5-sonnet-diff-not-lazy
test_cases: 89
model: claude-3.5-sonnet (diff)
edit_format: diff
commit_hash: 7396e38-dirty
pass_rate_1: 64.0
percent_cases_well_formed: 76.4
error_outputs: 176
num_malformed_responses: 39
num_with_malformed_responses: 21
user_asks: 11
lazy_comments: 2
syntax_errors: 4
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
command: aider --sonnet
date: 2024-07-01
versions: 0.40.7-dev
seconds_per_case: 42.8
total_cost: 11.5242
- dirname: 2024-07-04-15-06-43--refac-deepseek-coder2-128k
test_cases: 89
model: DeepSeek Coder V2 (128k context)
edit_format: diff
commit_hash: 08868fd
pass_rate_1: 38.2
percent_cases_well_formed: 73.0
error_outputs: 393
num_malformed_responses: 89
num_with_malformed_responses: 24
user_asks: 4
lazy_comments: 2
syntax_errors: 1
indentation_errors: 5
exhausted_context_windows: 3
test_timeouts: 0
command: aider --model deepseek/deepseek-coder
date: 2024-07-04
versions: 0.42.1-dev
seconds_per_case: 82.9
total_cost: 0.2601

View file

@ -0,0 +1,17 @@
You can get started quickly like this:
```
$ pip install aider-chat
# Change directory into a git repo
$ cd /to/your/git/repo
# Work with Claude 3.5 Sonnet on your repo
$ export ANTHROPIC_API_KEY=your-key-goes-here
$ aider
# Work with GPT-4o on your repo
$ export OPENAI_API_KEY=your-key-goes-here
$ aider
```

View file

@ -0,0 +1,20 @@
{% if page.highlight_image %}
<meta property="og:image" content="{{ site.url }}{{ page.highlight_image }}">
<meta property="twitter:image" content="{{ site.url }}{{ page.highlight_image }}">
{% else %}
<meta property="og:image" content="{{ site.url }}/assets/aider.jpg">
<meta property="twitter:image" content="{{ site.url }}/assets/aider-square.jpg">
{% endif %}
<link rel="alternate" type="application/rss+xml" title="RSS Feed" href="{{ site.url }}/feed.xml">
<link rel="preconnect" href="https://fonts.gstatic.com">
<link rel="preload" href="https://fonts.googleapis.com/css?family=Open+Sans:400,700&display=swap" as="style" type="text/css" crossorigin>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="theme-color" content="#157878">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<link rel="icon" type="image/png" sizes="32x32" href="{{ '/assets/icons/favicon-32x32.png' | relative_url }}">
<link rel="icon" type="image/png" sizes="16x16" href="{{ '/assets/icons/favicon-16x16.png' | relative_url }}">
<link rel="apple-touch-icon" sizes="180x180" href="{{ '/assets/icons/apple-touch-icon.png' | relative_url }}">
<link rel="manifest" href="{{ '/assets/icons/site.webmanifest' | relative_url }}">
<link rel="mask-icon" href="{{ '/assets/icons/safari-pinned-tab.svg' | relative_url }}" color="#5bbad5">
<meta name="msapplication-TileColor" content="#da532c">
<meta name="theme-color" content="#ffffff">

View file

@ -0,0 +1,22 @@
If you need more help, please check our
[GitHub issues](https://github.com/paul-gauthier/aider/issues)
and file a new issue if your problem isn't discussed.
Or drop into our
[Discord](https://discord.gg/Tv2uQnR88V)
to chat with us.
When reporting problems, it is very helpful if you can provide:
- Aider version
- LLM model you are using
Including the "announcement" lines that
aider prints at startup
is an easy way to share this helpful info.
```
Aider v0.37.1-dev
Models: gpt-4o with diff edit format, weak model gpt-3.5-turbo
Git repo: .git with 243 files
Repo-map: using 1024 tokens
```

View file

@ -0,0 +1,62 @@
## Unknown context window size and token costs
```
Model foobar: Unknown context window size and costs, using sane defaults.
```
*You can probably ignore the unknown context window size and token costs warning.*
If you specify a model that aider has never heard of, you will get
this warning.
This means aider doesn't know the context window size and token costs
for that model.
Aider will use an unlimited context window and assume the model is free,
so this is not usually a significant problem.
See the docs on
[configuring advanced model settings](/docs/config/adv-model-settings.html)
for details on how to remove this warning.
## Did you mean?
If aider isn't familiar with the model you've specified,
it will suggest similarly named models.
This helps
in the case where you made a typo or mistake when specifying the model name.
```
Model gpt-5o: Unknown context window size and costs, using sane defaults.
Did you mean one of these?
- gpt-4o
```
## Missing environment variables
You need to set the listed environment variables.
Otherwise you will get error messages when you start chatting with the model.
```
Model azure/gpt-4-turbo: Missing these environment variables:
- AZURE_API_BASE
- AZURE_API_VERSION
- AZURE_API_KEY
```
## Unknown which environment variables are required
```
Model gpt-5: Unknown which environment variables are required.
```
Aider is unable verify the environment because it doesn't know
which variables are required for the model.
If required variables are missing,
you may get errors when you attempt to chat with the model.
You can look in the [aider's LLM documentation](/docs/llms.html)
or the
[litellm documentation](https://docs.litellm.ai/docs/providers)
to see if the required variables are listed there.

View file

@ -0,0 +1,4 @@
You can send long, multi-line messages in the chat in a few ways:
- Paste a multi-line message directly into the chat.
- Enter `{` alone on the first line to start a multiline message and `}` alone on the last line to end it.
- Use Meta-ENTER to start a new line without sending the message (Esc+ENTER in some environments).

View file

@ -0,0 +1,7 @@
<footer class="site-footer">
Aider is AI pair programming in your terminal.
Aider is on
<a href="https://github.com/paul-gauthier/aider">GitHub</a>
and
<a href="https://discord.gg/Tv2uQnR88V">Discord</a>.
</footer>

View file

@ -0,0 +1,9 @@
Aider has special support for providing
OpenAI and Anthropic API keys
via command line switches and yaml config settings.
*All other LLM providers* must
have their keys and settings
specified in environment variables.
This can be done in your shell,
or by using a
[`.env` file](/docs/config/dotenv.html).

View file

@ -0,0 +1,10 @@
<!DOCTYPE html>
<html lang="en-US">
<meta charset="utf-8">
<title>Redirecting&hellip;</title>
<link rel="canonical" href="{{ page.redirect.to }}">
<script>location="{{ page.redirect.to }}"</script>
<meta http-equiv="refresh" content="0; url={{ page.redirect.to }}">
<h1>Redirecting&hellip;</h1>
<a href="{{ page.redirect.to }}">Click here if you are not redirected.</a>
</html>

View file

@ -0,0 +1 @@
../docs/ctags.md

View file

@ -0,0 +1 @@
../docs/benchmarks.md

View file

@ -0,0 +1,268 @@
---
title: Building a better repository map with tree sitter
excerpt: Tree-sitter allows aider to build a repo map that better summarizes large code bases.
highlight_image: /assets/robot-ast.png
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Building a better repository map with tree sitter
![robot flowchat](/assets/robot-ast.png)
GPT-4 is extremely useful for "self-contained" coding tasks,
like generating or modifying a simple function
that has no dependencies. Tools like GitHub CoPilot serve
these simple coding tasks well.
But making complex changes in a larger, pre-existing codebase
is much more difficult, for both humans and AIs.
To do this successfully, you need to:
1. Find the code that needs to be changed.
2. Understand how that code relates to the rest of the codebase.
3. Make the correct code change to accomplish the task.
GPT-4 is actually great at making the code changes (3),
once you tell it which files need to be changed (1)
and show it how they fit into the rest of the codebase (2).
This article is going to focus on step (2), providing "code context":
- We need to help GPT understand the overall codebase.
- This will help it understand the code it needs to change, which may depend on other parts of the codebase.
- It will also help GPT write new code and modify the existing code in a way
that respects and utilizes existing libraries, modules and abstractions
found elsewhere in the codebase.
- We must convey all of this "code context" to GPT in an
efficient manner that fits within the limited context window.
To address these issues, aider
sends GPT a **concise map of your whole git repository**
that includes
the most important classes and functions along with their types and call signatures.
This **repository map** is now built automatically using
[tree-sitter](https://tree-sitter.github.io/tree-sitter/)
to extract symbol definitions from source files.
Tree-sitter is used by many IDEs, editors and LSP servers to
help humans search and navigate large codebases.
Aider now uses it to help GPT better comprehend, navigate
and edit code in larger repos.
*To code with GPT-4 using the techniques discussed here, just install [aider](https://aider.chat/docs/install.html).*
## The problem: code context
GPT-4 is great at "self contained" coding tasks, like writing or
modifying a pure function with no external dependencies.
GPT can easily handle requests like "write a
Fibonacci function" or "rewrite this loop using list
comprehensions", because they require no context beyond the code
being discussed.
Most real code is not pure and self-contained, it is intertwined with
and depends on code from many different files in a repo.
If you ask GPT to "switch all the print statements in class Foo to
use the BarLog logging system", it needs to see and
modify the code in the Foo class, but it also needs to understand
how to use
the project's BarLog
subsystem.
A simple solution is to **send the entire codebase** to GPT along with
each change request. Now GPT has all the context! But this won't work
for even moderately
sized repos, because they won't fit into the context window.
A better approach is to be selective,
and **hand pick which files to send**.
For the example above, you could send the file that
contains the Foo class
and the file that contains the BarLog logging subsystem.
This works pretty well, and is supported by aider -- you
can manually specify which files to "add to the chat" you are having with GPT.
But sending whole files is a bulky way to send code context,
wasting the precious context window.
GPT doesn't need to see the entire implementation of BarLog,
it just needs to understand it well enough to use it.
You may quickly run out of context window by sending
full files of code
just to convey context.
Aider also strives to reduce the manual work involved in
coding with AI.
So in an ideal world, we'd like aider to automatically
identify and provide the needed code context.
## Using a repo map to provide context
Aider sends a **repo map** to GPT along with
each request from the user to make a code change.
The map contains a list of the files in the
repo, along with the key symbols which are defined in each file.
It shows how each of these symbols are defined in the
source code, by including the critical lines of code for each definition.
Here's a
sample of the map of the aider repo, just showing the maps of
[base_coder.py](https://github.com/paul-gauthier/aider/blob/main/aider/coders/base_coder.py)
and
[commands.py](https://github.com/paul-gauthier/aider/blob/main/aider/commands.py)
:
```
aider/coders/base_coder.py:
⋮...
│class Coder:
│ abs_fnames = None
⋮...
@classmethod
│ def create(
│ self,
│ main_model,
│ edit_format,
│ io,
│ skip_model_availabily_check=False,
│ **kwargs,
⋮...
│ def abs_root_path(self, path):
⋮...
│ def run(self, with_message=None):
⋮...
aider/commands.py:
⋮...
│class Commands:
│ voice = None
⋮...
│ def get_commands(self):
⋮...
│ def get_command_completions(self, cmd_name, partial):
⋮...
│ def run(self, inp):
⋮...
```
Mapping out the repo like this provides some key benefits:
- GPT can see classes, methods and function signatures from everywhere in the repo. This alone may give it enough context to solve many tasks. For example, it can probably figure out how to use the API exported from a module just based on the details shown in the map.
- If it needs to see more code, GPT can use the map to figure out by itself which files it needs to look at in more detail. GPT will then ask to see these specific files, and aider will automatically add them to the chat context.
## Optimizing the map
Of course, for large repositories even just the repo map might be too large
for GPT's context window.
Aider solves this problem by sending just the **most relevant**
portions of the repo map.
It does this by analyzing the full repo map using
a graph ranking algorithm, computed on a graph
where each source file is a node and edges connect
files which have dependencies.
Aider optimizes the repo map by
selecting the most important parts of the codebase
which will
fit into the token budget assigned by the user
(via the `--map-tokens` switch, which defaults to 1k tokens).
The sample map shown above doesn't contain *every* class, method and function from those
files.
It only includes the most important identifiers,
the ones which are most often referenced by other portions of the code.
These are the key pieces of context that GPT needs to know to understand
the overall codebase.
## Using tree-sitter to make the map
Under the hood, aider uses
[tree sitter](https://tree-sitter.github.io/tree-sitter/)
to build the
map.
It specifically uses the
[py-tree-sitter-languages](https://github.com/grantjenks/py-tree-sitter-languages)
python module,
which provides simple, pip-installable binary wheels for
[most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
Tree-sitter parses source code into an Abstract Syntax Tree (AST) based
on the syntax of the programming language.
Using the AST, we can identify where functions, classes, variables, types and
other definitions occur in the source code.
We can also identify where else in the code these things are used or referenced.
Aider uses all of these definitions and references to
determine which are the most important identifiers in the repository,
and to produce the repo map that shows just those key
lines from the codebase.
## What about ctags?
The tree-sitter repository map replaces the
[ctags based map](https://aider.chat/docs/ctags.html)
that aider originally used.
Switching from ctags to tree-sitter provides a bunch of benefits:
- The map is richer, showing full function call signatures and other details straight from the source files.
- Thanks to `py-tree-sitter-languages`, we get full support for many programming languages via a python package that's automatically installed as part of the normal `pip install aider-chat`.
- We remove the requirement for users to manually install `universal-ctags` via some external tool or package manager (brew, apt, choco, etc).
- Tree-sitter integration is a key enabler for future work and capabilities for aider.
## Future work
You'll recall that we identified the 3 key steps
required to use GPT
to complete a coding task within a large, pre-existing codebase:
1. Find the code that needs to be changed.
2. Understand how that code relates to the rest of the codebase.
3. Make the correct code change to accomplish the task.
We're now using tree-sitter to help solve the code context problem (2),
but it's also an important foundation
for future work on automatically finding all the code which
will need to be changed (1).
Right now, aider relies on the user to specify which source files
will need to be modified to complete their request.
Users manually "add files to the chat" using aider's `/add` command,
which makes those files available for GPT to modify.
This works well, but a key piece of future work is to harness the
power of GPT and tree-sitter to automatically identify
which parts of the code will need changes.
## Try it out
To code with GPT-4 using the techniques discussed here,
just install [aider](https://aider.chat/docs/install.html).
## Credits
Aider uses
[modified versions of the tags.scm files](https://github.com/paul-gauthier/aider/tree/main/aider/queries)
from these
open source tree-sitter language implementations:
* [https://github.com/tree-sitter/tree-sitter-c](https://github.com/tree-sitter/tree-sitter-c) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-c-sharp](https://github.com/tree-sitter/tree-sitter-c-sharp) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-cpp](https://github.com/tree-sitter/tree-sitter-cpp) — licensed under the MIT License.
* [https://github.com/Wilfred/tree-sitter-elisp](https://github.com/Wilfred/tree-sitter-elisp) — licensed under the MIT License.
* [https://github.com/elixir-lang/tree-sitter-elixir](https://github.com/elixir-lang/tree-sitter-elixir) — licensed under the Apache License, Version 2.0.
* [https://github.com/elm-tooling/tree-sitter-elm](https://github.com/elm-tooling/tree-sitter-elm) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-go](https://github.com/tree-sitter/tree-sitter-go) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-java](https://github.com/tree-sitter/tree-sitter-java) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-javascript](https://github.com/tree-sitter/tree-sitter-javascript) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-ocaml](https://github.com/tree-sitter/tree-sitter-ocaml) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-php](https://github.com/tree-sitter/tree-sitter-php) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-python](https://github.com/tree-sitter/tree-sitter-python) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-ql](https://github.com/tree-sitter/tree-sitter-ql) — licensed under the MIT License.
* [https://github.com/r-lib/tree-sitter-r](https://github.com/r-lib/tree-sitter-r) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-ruby](https://github.com/tree-sitter/tree-sitter-ruby) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-rust](https://github.com/tree-sitter/tree-sitter-rust) — licensed under the MIT License.
* [https://github.com/tree-sitter/tree-sitter-typescript](https://github.com/tree-sitter/tree-sitter-typescript) — licensed under the MIT License.

View file

@ -0,0 +1 @@
../docs/benchmarks-1106.md

View file

@ -0,0 +1 @@
../docs/benchmarks-speed-1106.md

View file

@ -0,0 +1 @@
../docs/unified-diffs.md

View file

@ -0,0 +1 @@
../docs/benchmarks-0125.md

View file

@ -0,0 +1,93 @@
---
title: Claude 3 beats GPT-4 on Aider's code editing benchmark
excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI.
highlight_image: /assets/2024-03-07-claude-3.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Claude 3 beats GPT-4 on Aider's code editing benchmark
[![benchmark results](/assets/2024-03-07-claude-3.svg)](https://aider.chat/assets/2024-03-07-claude-3.svg)
[Anthropic just released their new Claude 3 models](https://www.anthropic.com/news/claude-3-family)
with evals showing better performance on coding tasks.
With that in mind, I've been benchmarking the new models
using Aider's code editing benchmark suite.
Claude 3 Opus outperforms all of OpenAI's models,
making it the best available model for pair programming with AI.
To use Claude 3 Opus with aider:
```
pip install aider-chat
export ANTHROPIC_API_KEY=sk-...
aider --opus
```
## Aider's code editing benchmark
[Aider](https://github.com/paul-gauthier/aider)
is an open source command line chat tool that lets you
pair program with AI on code in your local git repo.
Aider relies on a
[code editing benchmark](https://aider.chat/docs/benchmarks.html)
to quantitatively evaluate how well
an LLM can make changes to existing code.
The benchmark uses aider to try and complete
[133 Exercism Python coding exercises](https://github.com/exercism/python).
For each exercise,
Exercism provides a starting python file with stubs for the needed functions,
a natural language description of the problem to solve
and a test suite to evaluate whether the coder has correctly solved the problem.
The LLM gets two tries to solve each problem:
1. On the first try, it gets the initial stub code and the English description of the coding task. If the tests all pass, we are done.
2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
## Benchmark results
### Claude 3 Opus
- The new `claude-3-opus-20240229` model got the highest score ever on this benchmark, completing 68.4% of the tasks with two tries.
- Its single-try performance was comparable to the latest GPT-4 Turbo model `gpt-4-0125-preview`, at 54.1%.
- While Opus got the highest score, it was only a few points higher than the GPT-4 Turbo results. Given the extra costs of Opus and the slower response times, it remains to be seen which is the most practical model for daily coding use.
### Claude 3 Sonnet
- The new `claude-3-sonnet-20240229` model performed similarly to OpenAI's GPT-3.5 Turbo models with an overall score of 54.9% and a first-try score of 43.6%.
## Code editing
It's highly desirable to have the LLM send back code edits as
some form of diffs, rather than having it send back an updated copy of the
entire source code.
Weaker models like GPT-3.5 are unable to use diffs, and are stuck sending back
updated copies of entire source files.
Aider uses more efficient
[search/replace blocks](https://aider.chat/2023/07/02/benchmarks.html#diff)
with the original GPT-4
and
[unified diffs](https://aider.chat/2023/12/21/unified-diffs.html#unified-diff-editing-format)
with the newer GPT-4 Turbo models.
Claude 3 Opus works best with the search/replace blocks, allowing it to send back
code changes efficiently.
Unfortunately, the Sonnet model was only able to work reliably with whole files,
which limits it to editing smaller source files and uses more tokens, money and time.
## Other observations
There are a few other things worth noting:
- Claude 3 Opus and Sonnet are both slower and more expensive than OpenAI's models. You can get almost the same coding skill faster and cheaper with OpenAI's models.
- Claude 3 has a 2X larger context window than the latest GPT-4 Turbo, which may be an advantage when working with larger code bases.
- The Claude models refused to perform a number of coding tasks and returned the error "Output blocked by content filtering policy". They refused to code up the [beer song](https://exercism.org/tracks/python/exercises/beer-song) program, which makes some sort of superficial sense. But they also refused to work in some larger open source code bases, for unclear reasons.
- The Claude APIs seem somewhat unstable, returning HTTP 5xx errors of various sorts. Aider automatically recovers from these errors with exponential backoff retries, but it's a sign that Anthropic made be struggling under surging demand.

View file

@ -0,0 +1,74 @@
---
title: GPT-4 Turbo with Vision is a step backwards for coding
excerpt: OpenAI's GPT-4 Turbo with Vision model scores worse on aider's code editing benchmarks than all the previous GPT-4 models. In particular, it seems much more prone to "lazy coding" than the existing GPT-4 Turbo "preview" models.
highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# GPT-4 Turbo with Vision is a step backwards for coding
[OpenAI just released GPT-4 Turbo with Vision](https://twitter.com/OpenAIDevs/status/1777769463258988634)
and it performs worse on aider's coding benchmark suites than all the previous GPT-4 models.
In particular, it seems much more prone to "lazy coding" than the
existing GPT-4 Turbo "preview" models.
## Code editing skill
[![benchmark results](/assets/2024-04-09-gpt-4-turbo.svg)](https://aider.chat/assets/2024-04-09-gpt-4-turbo.svg)
Aider relies on a
[code editing benchmark](https://aider.chat/docs/benchmarks.html#the-benchmark)
to quantitatively evaluate how well
an LLM can make changes to existing code.
The benchmark uses aider to try and complete
[133 Exercism Python coding exercises](https://github.com/exercism/python).
For each exercise, the LLM gets two tries to solve each problem:
1. On the first try, it gets initial stub code and the English description of the coding task. If the tests all pass, we are done.
2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
**GPT-4 Turbo with Vision
scores only 62% on this benchmark,
the lowest score of any of the existing GPT-4 models.**
The other models scored 63-66%, so this represents only a small
regression, and is likely statistically insignificant when compared
against `gpt-4-0613`.
## Lazy coding
[![benchmark results](/assets/2024-04-09-gpt-4-turbo-laziness.svg)](https://aider.chat/assets/2024-04-09-gpt-4-turbo-laziness.svg)
The GPT-4 Turbo "preview" models have been widely criticized for being "lazy"
when coding.
They often omit needed code
and instead leave comments with homework assignments like "implement method here".
```
def some_complex_method(foo, bar):
# ... implement method here ...
```
Aider uses a ["laziness" benchmark suite](https://github.com/paul-gauthier/refactor-benchmark)
which is designed to both provoke and quantify lazy coding.
It consists of
89 python refactoring tasks
which tend to make GPT-4 Turbo code in that lazy manner.
**The new GPT-4 Turbo with Vision model scores only 34% on aider's
refactoring benchmark, making it the laziest coder of all the GPT-4 Turbo models
by a significant margin.**
# Conclusions
Aider has full support for the new GPT-4 Turbo with Vision
model, which you can access using the switch `--model gpt-4-turbo-2024-04-09`.
But aider will continue to use `gpt-4-1106-preview` by default,
as it is by far the strongest coder of the GPT-4 models.

View file

@ -0,0 +1,55 @@
---
title: Aider in your browser
excerpt: Aider has an experimental browser UI, allowing you to collaborate with LLMs on code in your local git repo.
highlight_image: /assets/browser.jpg
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Aider in your browser
<div class="video-container">
<video controls loop poster="/assets/browser.jpg">
<source src="/assets/aider-browser-social.mp4" type="video/mp4">
<a href="/assets/aider-browser-social.mp4">Aider browser UI demo video</a>
</video>
</div>
<style>
.video-container {
position: relative;
padding-bottom: 101.89%; /* 1080 / 1060 = 1.0189 */
height: 0;
overflow: hidden;
}
.video-container video {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
}
</style>
Use aider's new experimental browser UI to collaborate with LLMs
to edit code in your local git repo.
Aider will directly edit the code in your local source files,
and [git commit the changes](https://aider.chat/docs/git.html)
with sensible commit messages.
You can start a new project or work with an existing git repo.
Aider works well with GPT 3.5, GPT-4, GPT-4 Turbo with Vision,
and Claude 3 Opus.
It also supports [connecting to almost any LLM](https://aider.chat/docs/llms.html).
Use the `--browser` switch to launch the browser version of aider:
```
pip install aider-chat
export OPENAI_API_KEY=<key> # Mac/Linux
setx OPENAI_API_KEY <key> # Windows
aider --browser
```

View file

@ -0,0 +1,327 @@
---
title: Drawing graphs with aider, GPT-4o and matplotlib
excerpt: Use GPT-4o to draw graphs with matplotlib, including adjusting styles and making visual changes. You get the graph, but you also get the code in your repo.
highlight_image: /assets/models-over-time.png
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
![LLM coding skill over time](/assets/models-over-time.svg)
# Drawing graphs with aider, GPT-4o and matplotlib
I recently wanted to draw a graph showing how LLM code editing skill has been
changing over time as new models have been released by OpenAI, Anthropic and others.
I have all the
[data in a yaml file](https://github.com/paul-gauthier/aider/blob/main/website/_data/edit_leaderboard.yml) that is used to render
[aider's LLM leaderboards](https://aider.chat/docs/leaderboards/).
Below is the aider chat transcript, which shows:
- I launch aider with the yaml file, a file with other plots I've done recently (so GPT can crib the style) and an empty file called `over_time.py`.
- Then I ask GPT to draw the scatterplot I want.
- I run the resulting script and share the error output with GPT so it can fix a small bug.
- I ask it to color the points for GPT-4 and GPT-3.5 family models differently, to better see trends within those model families.
- I add a PNG of the graph to the chat, so GPT can see the current graph and understand why I want it to move the point labels so they don't overlap the points.
- I work through a series of other small style changes, like changing fonts and the graph border.
In the end I have the graph, but I also have the python code in my repo.
So I can update this graph easily whenever I add new entries to the yaml data file.
## Aider chat transcript
<div class="chat-transcript" markdown="1">
> Aider v0.35.1-dev
> Models: gpt-4o with diff edit format, weak model gpt-3.5-turbo
> Added _data/edit_leaderboard.yml to the chat.
> Added benchmark/over_time.py to the chat.
> Added benchmark/plots.py to the chat.
#### in over_time.py, draw a graph for each of the yaml entries with a `released` date. plot date on x, `pass_rate_2` on y and label with the `model` name. do a scatter plot. use colors/styles similar to the other graphs.
To draw a scatter plot for each of the YAML entries with a `released` date, we need to:
1. Read the YAML file.
2. Extract the `released` date and `pass_rate_2` for each entry.
3. Plot the data using matplotlib.
Here are the changes:
benchmark/over_time.py
```python
<<<<<<< SEARCH
=======
import matplotlib.pyplot as plt
import yaml
from datetime import datetime
def plot_over_time(yaml_file):
with open(yaml_file, 'r') as file:
data = yaml.safe_load(file)
dates = []
pass_rates = []
models = []
for entry in data:
if 'released' in entry and 'pass_rate_2' in entry:
dates.append(datetime.strptime(entry['released'], '%Y-%m-%d'))
pass_rates.append(entry['pass_rate_2'])
models.append(entry['model'])
plt.figure(figsize=(10, 6))
plt.scatter(dates, pass_rates, c='blue', alpha=0.5)
for i, model in enumerate(models):
plt.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
plt.xlabel('Release Date')
plt.ylabel('Pass Rate 2')
plt.title('Model Performance Over Time')
plt.grid(True)
plt.tight_layout()
plt.show()
# Example usage
plot_over_time('_data/edit_leaderboard.yml')
>>>>>>> REPLACE
```
> Applied edit to benchmark/over_time.py
> Commit 49ac9ca aider: Added scatter plot functionality for YAML entries with a `released` date and `pass_rate_2`.
</div>
<div class="chat-transcript" markdown="1">
#### /run python benchmark/over_time.py
> Traceback (most recent call last):
File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 33, in <module>
plot_over_time('_data/edit_leaderboard.yml')
File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 15, in plot_over_time
dates.append(datetime.strptime(entry['released'], '%Y-%m-%d'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: strptime() argument 1 must be str, not datetime.date
> Add the output to the chat? y
> Traceback (most recent call last):
> File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 33, in <module>
> plot_over_time('_data/edit_leaderboard.yml')
> File "/Users/gauthier/Projects/aider/benchmark/over_time.py", line 15, in plot_over_time
> dates.append(datetime.strptime(entry['released'], '%Y-%m-%d'))
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> TypeError: strptime() argument 1 must be str, not datetime.date
The error indicates that the `released` field in the YAML file is already a `datetime.date` object, so we don't need to parse it with `datetime.strptime`. We can directly use it in the plot.
Here are the changes:
benchmark/over_time.py
```python
<<<<<<< SEARCH
dates.append(datetime.strptime(entry['released'], '%Y-%m-%d'))
=======
dates.append(entry['released'])
>>>>>>> REPLACE
```
> Applied edit to benchmark/over_time.py
> Commit af2f4b7 aider: Use the `released` field directly in the plot instead of parsing it with `datetime.strptime`.
</div>
<div class="chat-transcript" markdown="1">
#### color the "gpt-4" models differently
To color the "gpt-4" models differently, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-4" in the model name and assign a different color.
Here are the changes:
benchmark/over_time.py
```python
<<<<<<< SEARCH
plt.scatter(dates, pass_rates, c='blue', alpha=0.5)
=======
colors = ['red' if 'gpt-4' in model else 'blue' for model in models]
plt.scatter(dates, pass_rates, c=colors, alpha=0.5)
>>>>>>> REPLACE
```
This change will color the points red if the model name contains "gpt-4", otherwise, it will color them blue.
> Applied edit to benchmark/over_time.py
> Commit 0d11357 aider: Color the "gpt-4" models differently.
</div>
<div class="chat-transcript" markdown="1">
#### color the gpt-3.5 models a new color
To color the "gpt-3.5" models a new color, we need to modify the plotting code in `benchmark/over_time.py` to check for "gpt-3.5" in the model name and assign a different color.
Here are the changes:
benchmark/over_time.py
```python
<<<<<<< SEARCH
colors = ['red' if 'gpt-4' in model else 'blue' for model in models]
=======
colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
>>>>>>> REPLACE
```
> Applied edit to benchmark/over_time.py
> Commit 85d9d84 aider: Color the "gpt-3.5" models green in the plotting code.
</div>
<div class="chat-transcript" markdown="1">
#### use the colorscheme and fonts from the other graphs in plots.py
To use the colorscheme and fonts from the graphs in `plots.py`, we need to:
1. Import the necessary settings from `plots.py`.
2. Apply these settings to the graph in `over_time.py`.
Here are the changes:
benchmark/over_time.py
```python
<<<<<<< SEARCH
import yaml
from datetime import datetime
=======
import yaml
from datetime import datetime
from matplotlib import rc
>>>>>>> REPLACE
```
benchmark/over_time.py
```python
<<<<<<< SEARCH
plt.figure(figsize=(10, 6))
=======
plt.rcParams["hatch.linewidth"] = 0.5
plt.rcParams["hatch.color"] = "#444444"
rc("font", **{"family": "sans-serif", "sans-serif": ["Helvetica"], "size": 10})
fig, ax = plt.subplots(figsize=(6, 4))
ax.grid(axis="y", zorder=0, lw=0.2)
>>>>>>> REPLACE
```
benchmark/over_time.py
```python
<<<<<<< SEARCH
plt.scatter(dates, pass_rates, c=colors, alpha=0.5)
=======
ax.scatter(dates, pass_rates, c=colors, alpha=0.5)
>>>>>>> REPLACE
```
benchmark/over_time.py
```python
<<<<<<< SEARCH
plt.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
plt.xlabel('Release Date')
plt.ylabel('Pass Rate 2')
plt.title('Model Performance Over Time')
plt.grid(True)
plt.tight_layout()
plt.show()
=======
ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
ax.set_xlabel('Release Date')
ax.set_ylabel('Pass Rate 2')
ax.set_title('Model Performance Over Time')
plt.tight_layout()
plt.savefig("tmp_over_time.svg")
imgcat(fig)
>>>>>>> REPLACE
```
> Applied edit to benchmark/over_time.py
> Commit 8d670f1 aider: Used the colorscheme and fonts from the graphs in `plots.py` in `over_time.py`.
</div>
<div class="chat-transcript" markdown="1">
#### /add tmp_over_time.png
> Added /Users/gauthier/Projects/aider/tmp_over_time.png to the chat
#### make the labels not overlap the dots
To make the labels not overlap the dots, we can adjust the position of the labels slightly using the `xytext` parameter in the `annotate` function.
Here are the changes:
benchmark/over_time.py
```python
<<<<<<< SEARCH
ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75)
=======
ax.annotate(model, (dates[i], pass_rates[i]), fontsize=8, alpha=0.75,
xytext=(5, 5), textcoords='offset points')
>>>>>>> REPLACE
```
> Committing benchmark/over_time.py before applying edits.
> Commit 49eb72c Updated image file format from SVG to PNG and added imgcat library for displaying the plot.
> Applied edit to benchmark/over_time.py
> Commit 75fa061 aider: Adjust the position of labels to prevent overlap with dots.
</div>
<div class="chat-transcript" markdown="1">
#### make the dots bigger
To make the dots bigger, we can adjust the `s` parameter in the `scatter` function, which controls the size of the dots.
Here are the changes:
benchmark/over_time.py
```python
<<<<<<< SEARCH
ax.scatter(dates, pass_rates, c=colors, alpha=0.5)
=======
ax.scatter(dates, pass_rates, c=colors, alpha=0.5, s=100)
>>>>>>> REPLACE
```
> Applied edit to benchmark/over_time.py
> Commit a018fdf aider: Make the dots bigger.
</div>
<div class="chat-transcript" markdown="1">
#### make the border of the graph thinner and lighter
To make the border of the graph thinner and lighter, we can adjust the `spines` properties of the axes.
Here are the changes:
benchmark/over_time.py
```python
<<<<<<< SEARCH
colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
=======
for spine in ax.spines.values():
spine.set_edgecolor('#DDDDDD')
spine.set_linewidth(0.5)
colors = ['red' if 'gpt-4' in model else 'green' if 'gpt-3.5' in model else 'blue' for model in models]
>>>>>>> REPLACE
```
> Applied edit to benchmark/over_time.py
> Commit 77ba518 aider: Made the border of the graph thinner and lighter by adjusting the `spines` properties of the axes.
</div>

View file

@ -0,0 +1,14 @@
---
title: A draft post.
excerpt: With a draft summary.
highlight_image: /assets/linting.jpg
draft: true
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# A draft post
Content TBD.

View file

@ -0,0 +1,149 @@
---
title: Linting code for LLMs with tree-sitter
excerpt: Aider now lints code after every LLM edit and automatically fixes errors, using tree-sitter and AST-aware code context.
highlight_image: /assets/linting.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
[![Linting code](/assets/linting.jpg)](https://aider.chat/assets/linting.jpg)
# Linting code for LLMs with tree-sitter
Aider now lints your code after every LLM edit, and offers to automatically fix
any linting errors.
You can also use aider's lint-and-fix functionality on your source files any time
you like, to speedily resolve issues with code written by humans.
Aider shows linting errors to the LLM in a novel format,
using tree-sitter
to help display relevant code context for each
error.
This increases the ability of the LLM to understand the problem and
make the correct changes to resolve it.
Aider ships with basic linters built with tree-sitter that support
[most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
These built in linters will detect syntax errors and other fatal problems with the code.
You can also configure aider to use your preferred linters.
This allows aider to check for a larger class of problems, keep the code style
aligned with the rest of your team, etc.
## Linting and fixing your code
Aider now lints each source file after it applies the edits
suggested by an LLM.
If problems are found, aider will ask if you'd like it to
attempt to fix the errors.
If so, aider will send the LLM a report of the lint errors
and request changes to fix them. This process may iterate a few times
as the LLM works to fully resolve all the issues.
You can also lint and fix files any time, on demand from within the aider chat or via the
command line:
- The in-chat `/lint` command will lint and fix all the files which have
been added to the chat by default. Or you can name any files
in your git repo as arguments.
- From the command line, you can run `aider --lint` to lint and fix
all the dirty files in the repo.
Or you can specify specific filenames on the command line.
## An LLM-friendly lint report
Most linting tools produce terse and cryptic output,
which is one reason many engineers appreciate IDEs that highlight
linting errors.
LLM's don't have the luxury of using an IDE, so aider sends
the linting errors in an LLM friendly format.
Here's an example of raw output of the `flake8` python linter:
```
app.py:23:36: F821 undefined name 'num'
app.py:41:16: F541 f-string is missing placeholders
```
This sort of output depends on the user to reference line numbers to find and fix
each reported error.
LLMs are quite bad at working with source code line numbers, often
making off-by-one errors and other mistakes even when provided with
a fully numbered code listing.
Aider augments the raw linter by
displaying and
highlighting the lines that have errors within their
containing functions, methods, classes.
To do this, aider uses tree-sitter to obtain the code's AST and analyzes it
in light of the linting errors.
LLMs are more effective at editing code that's provided
with context like this.
```
app.py:23:36: F821 undefined name 'num'
app.py:41:16: F541 f-string is missing placeholders
app.py:
...⋮...
6│class LongNum:
7│ def __init__(self, num):
8│ """
9│ Initialize the number.
10│ """
...⋮...
19│ def __str__(self):
20│ """
21│ Render the number as a string.
22│ """
23█ return str(num)
24│
25│
26│@app.route('/subtract/<int:x>/<int:y>')
...⋮...
38│@app.route('/divide/<int:x>/<int:y>')
39│def divide(x, y):
40│ if y == 0:
41█ return f"Error: Cannot divide by zero"
42│ else:
43│ result = x / y
44│ return str(result)
45│
...⋮...
```
## Basic linters for most popular languages
Aider comes batteries-included with built in linters for
[most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
This provides wide support for linting without requiring
users to manually install a linter and configure it to work with aider.
Aider's built in language-agnostic linter uses tree-sitter to parse
the AST of each file.
When tree-sitter encounters a syntax error or other fatal issue
parsing a source file, it inserts an AST node with type `ERROR`.
Aider simply uses these `ERROR` nodes to identify all the lines
with syntax or other types of fatal error, and displays
them in the LLM friendly format described above.
## Configuring your preferred linters
You can optionally configure aider to use
your preferred linters with the `--lint-cmd` switch.
```
# To lint javascript with jslint
aider --lint-cmd javascript:jslint
# To lint python with flake8 using some specific args:
aider --lint-cmd "python:flake8 --select=E9,F821,F823..."
```
You can provide multiple `--lint-cmd` switches
to set linters for various languages.
You can also durably set linters in your `.aider.conf.yml` file.

View file

@ -0,0 +1,454 @@
---
title: How aider scored SOTA 26.3% on SWE Bench Lite
excerpt: Aider achieved this result mainly through its existing features that focus on static code analysis, reliable LLM code editing, and pragmatic UX for AI pair programming.
highlight_image: /assets/swe_bench_lite.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# How aider scored SOTA 26.3% on SWE Bench Lite
[Aider scored 26.3%](https://github.com/swe-bench/experiments/pull/7)
on the
[SWE Bench Lite benchmark](https://www.swebench.com),
achieving a state-of-the-art result.
The previous top leaderboard entry was 20.3%
from Amazon Q Developer Agent.
See also [aider's SOTA result on the main SWE Bench](https://aider.chat/2024/06/02/main-swe-bench.html).
[![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg)
**All of aider's results reported here are pass@1 results,
obtained without using the SWE Bench `hints_text`.**
All results in the above chart are unhinted pass@1 results.
Please see the [references](#references)
for details on the data presented in this chart.
It was corrected on 5/30/24 to reflect apples-to-apples comparisons,
using pass@1 results from AutoCodeRover
and results from OpenDevin that don't use hints.
The [official SWE Bench Lite leaderboard](https://www.swebench.com)
only accepts pass@1 results that do not use hints.
## Interactive, not agentic
Aider achieved this result mainly through its existing features that focus on static code analysis, reliable LLM code editing, and pragmatic UX for AI pair programming.
Aider intentionally has quite limited and narrow "agentic behavior"
to avoid long delays, high token costs
and the need for users to repeatedly code review incorrect solutions.
It's also worth noting that aider currently does not use
RAG, vector search, tools or give the LLM access to search the web
or unilaterally execute code.
Aider is first and foremost an interactive tool for engineers to get real work done in
real code bases using a chat interface.
Aider provides a pair programming UX where users can ask for a change
and see the edits performed in real-time.
Aider can also offer additional help like fixing lint or test errors,
but the user is always in full interactive control.
This lets them quickly steer misunderstandings back on course and
avoid wasting time and token costs.
## Benchmark methodology
For the benchmark,
aider was launched in each problem's git repository
with the problem statement
submitted as the opening chat message from "the user."
After that aider runs as normal, with the following modifications:
- Aider's suggestions were always accepted without user approval.
- A simple harness was used to retry the SWE Bench problem if aider produced code that wasn't *plausibly correct*.
Plausibly correct means that aider reported that it had successfully edited the repo
without causing syntax errors or breaking any *pre-existing* tests.
- If the solution isn't plausible, the harness launches aider to try again from scratch,
alternating between using aider with GPT-4o and Opus.
- If no plausible solution is found after six tries, the harness picks the solution
with the fewest edit/lint/test problems.
It's important to be clear that
*aider and the benchmark harness
only had access to the pre-existing tests in each problem's repo*.
The held out "acceptance tests" were *only* used
after benchmarking to compute statistics on which problems aider
correctly resolved.
The [full harness to run aider on SWE Bench Lite is available on GitHub](https://github.com/paul-gauthier/aider-swe-bench).
The benchmarking process was similar to how a developer might use aider to
resolve a GitHub issue:
- They could launch aider in their repo with the command below, which
tells aider they want to accept every suggestion
and to use pytest to run tests.
- `aider --yes --test-cmd pytest`
- They could start the chat by pasting in the URL or text of a GitHub issue.
Aider will pull in the URL's content and then try and solve the issue.
- If aider doesn't produce code that lints and tests clean, the user might decide to revert the changes and try again, maybe using aider with a different LLM this time.
[Aider is tightly integrated with git](https://aider.chat/docs/git.html),
so it's always easy to revert AI changes that don't pan out.
Outside a benchmark setting, it's probably
unwise or at least highly inefficient
to let *any* AI agent run unsupervised on your code base.
The reason aider is intended to be used interactively
is so that the user can participate and direct aider's work and approve suggestions.
This way the user can offer immediate feedback or corrections if their initial
instructions turn out to be ambiguous,
or if the AI starts going down a wrong path.
## Aider with GPT-4o alone was SOTA
Running the benchmark harness
only using aider with GPT-4o to find plausible solutions
achieved a score of 25.0%.
This was itself matching the state-of-the-art, before being surpassed by the main
result being reported here
that used aider with both GPT-4o & Opus.
As noted below, a single attempt using Aider with GPT-4o tied
the current top entry on the leaderboard.
## Aider with GPT-4o & Opus
The benchmark harness alternated between running aider with GPT-4o and Opus.
The harness proceeded in a fixed order, always starting with GPT-4o and
then alternating with Opus until a plausible solution was found for each
problem.
The table below breaks down the plausible solutions that
were found for the 300 problems.
It also provides details on the 79 that were ultimately
verified as correctly resolving their issue.
Some noteworthy observations:
- *Just the first attempt* of Aider with GPT-4o resolved 20.3% of the problems, which ties the Amazon Q Developer Agent currently atop the official leaderboard.
- Including the second attempt, Aider with GPT-4o and Opus scored 23.6% on the benchmark.
These first two attempts obtained ~75% of all plausible and ~90% of all resolved solutions.
- A long tail of solutions continued to be found using both models including one correctly resolved solution on the final, sixth attempt of that problem.
| Attempt | Agent |Number&nbsp;of<br>plausible<br>solutions|Percent&nbsp;of<br>plausible<br>solutions| Number&nbsp;of<br/>correctly<br>resolved<br>solutions | Percent&nbsp;of<br>correctly<br>resolved<br>solutions | Score&nbsp;on<br>SWE&nbsp;Bench<br>Lite |
|:--------:|------------|---------:|---------:|----:|---:|--:|
| 1 | Aider with GPT-4o | 208 | 69.3% | 61 | 77.2% | 20.3% |
| 2 | Aider with Opus | 49 | 16.3% | 10 | 12.7% | 3.3% |
| 3 | Aider with GPT-4o | 20 | 6.7% | 3 | 3.8% | 1.0% |
| 4 | Aider with Opus | 9 | 3.0% | 2 | 2.5% | 0.7% |
| 5 | Aider with GPT-4o | 11 | 3.7% | 2 | 2.5% | 0.7% |
| 6 | Aider with Opus | 3 | 1.0% | 1 | 1.3% | 0.3% |
| **Total** | | **300** | **100%** | **79** | **100%** | **26.3%** |
If we break down the solutions solely by model,
we can see that aider with GPT-4o outperforms Opus.
This isn't a fair and direct comparison, because GPT-4o always took the first
turn and therefore got first crack at all the "easiest" problems.
Aider with Opus only ever saw problems that GPT-4o failed to
find plausible solutions for on its first try.
Aider with GPT-4o was producing higher quality plausible solutions,
with a greater chance of going on to be accepted as resolving the issue.
Again, this is biased by the turn ordering.
But other anecdotal evidence from earlier runs of the benchmark
also supports the observation that aider with GPT-4o is significantly stronger than Opus
for this benchmark.
| Agent | Number&nbsp;of<br>plausible<br>solutions | Number&nbsp;of<br>correctly<br>resolved<br>solutions | Percent&nbsp;of<br>plausible<br>which<br>correctly<br>resolved<br>|
|------------|---------:|---------:|---:|
| Aider with GPT-4o | 239 | 66 |27.6% |
| Aider with Opus | 61 | 13 |21.3% |
| **Total** | **300** | **79** |**26.3%** |
## Repository map, not RAG
The crucial first step in solving a SWE Bench problem is figuring out
which parts of the repo are relevant and which files need to be edited.
Most coding agents use some combination of RAG, vector search
and providing the LLM with
tools to interactively explore the code base.
Aider instead uses a
[repository map](https://aider.chat/2023/10/22/repomap.html)
to help the LLM understand the
layout, code structure, and content of a git repo.
The repo map is created through static analysis of the code's
abstract syntax tree and call graph
to provide a compact and powerful summary of the entire code base.
The map is constantly
tailored to show
repo context that is relevant to the current state of the chat conversation.
This is done by performing a graph optimization on the code's call graph.
When the user asks for a change to their code, the LLM can use the repo map
to decide which files to edit.
The LLM simply returns a normal text response explaining which files
it needs to edit and why.
Aider notices when the LLM mentions filenames from the repo,
and asks the user if they should be added to the chat.
Adding a file to the chat allows the LLM to see the full contents
of the file and edit it.
<div class="chat-transcript" markdown="1">
#### Please add a new /factorial/N endpoint.
To add a new /factorial/N endpoint, the most likely file that needs to be edited is app.py.
Please add app.py to the chat so I can proceed with the changes.
> app.py
> Add these files to the chat? yes
</div>
This is a convenient and natural workflow for interactive chat,
and it worked well for the SWE Bench problems.
Aider successfully identified the correct file to edit
in 70.3% of the benchmark tasks.
We can determine which file needs to be edited using the "gold" patch
which is associated with each SWE Bench task.
This patch was created by a human developer
to solve the issue, and therefore reveals a file which can
be edited to solve the problem.
Of course aider is not able to see or use the gold patch
or the file names it contains in any way.
This information was only used to compute
statistics outside the benchmarking process.
## Reliable code editing
Once files have been selected for editing,
the next step is of course to edit the source code to fix the problem.
Aider goes to great lengths to ensure that LLMs can not just write code,
but reliably *edit* code.
Aider has a collection of prompting strategies and code editing backends which have
been honed through
[extensive benchmarking](https://aider.chat/docs/leaderboards/).
These foundational capabilities help ensure that aider can
properly integrate code from LLMs into an existing code base and source files.
The repository map helps here too, making sure that the LLM
can see relevant classes, functions and variables from the entire repo.
This helps ensure that the project's existing APIs and conventions are
respected and utilized when new code is added.
Regardless, there are still cases where aider may be unable to cleanly
complete the edits specified by the LLM.
This is usually because the LLM has failed to conform to the editing
instructions in its system prompt.
When aider completes, it returns an editing outcome that indicates
whether it was able to successfully apply all edits.
The benchmark harness uses this editing status as
one criteria to determine if aider has
created a plausible solution.
## Linting and fixing
Another key criteria for a plausible solution is that it passes basic
linting, which means that the code has no syntax
or other fatal errors.
[Aider lints code](https://aider.chat/2024/05/22/linting.html)
after every LLM edit and offers to automatically fix
any problems.
Aider ships with built-in linters based on tree-sitter
which work with most popular programming languages.
Aider shows linting errors to the LLM in a novel format,
using the abstract syntax tree to display relevant code context for each
error.
This context helps LLMs understand the problem and
make the correct changes to resolve it.
<div class="chat-transcript" markdown="1">
```
app.py:23:36: F821 undefined name 'num'
app.py:
...⋮...
6│class LongNum:
...⋮...
19│ def expound(self, threshold):
20│ number = self.basis
21│ while number < threshold:
22│ number *= self.factor
23█ return num
24│
25│
...⋮...
```
> Attempt to fix lint errors? yes
</div>
In the benchmark, these linting suggestions are always accepted.
At completion,
aider reports a linting outcome that
indicates if it was able to produce
code without any outstanding linting errors.
The benchmark harness uses this status as
one of the criteria to determine if aider has
created a plausible solution.
## Testing and fixing
The final crtieria for a plausible solution is that
all tests must be passing.
Aider can be configured with the command to run tests for a repo,
and will automatically attempt to fix any test failures.
A user working on a python project might configure testing
by launching aider like this:
```
aider --test-cmd pytest
```
For the benchmark, aider is configured with a test command that will run the
tests that already exist in each problem's repository.
SWE Bench problems are based on repositories from large open
source projects with extensive existing test suites.
This means that
testing will fail if aider has broken any of these
pre-existing tests or if any new
tests that it created aren't passing.
As with editing and linting, aider reports a testing outcome
that indicates if it completed with any outstanding failing tests.
The benchmark harness uses this status when deciding if aider
has produced a plausible solution.
To be clear, *aider cannot run or even see the held out "acceptance tests"* that
are used to judge if a proposed solution correctly
resolves the problem.
Those tests are only run outside of aider and the benchmark harness,
to compute the final benchmark statistics.
## Finding a plausible solution
Each time aider executes, it reports
the outcome of the editing, linting, and testing
steps.
Each of these steps may complete successfully or
return a status that indicates that there were outstanding
problems that remain unresolved.
The benchmark harness uses these outcomes to determine if
aider has produced a plausible
solution to the current SWE Bench task.
A plausible solution is one where aider
returns saying that it
edited the repo with no outstanding
edit, lint, or test errors.
In this case, aider's changes are recorded
as the SWE Bench `model_patch` to be evaluated later with the
acceptance tests.
If the solution is not plausible, another
instance of aider is launched again from scratch on the same problem.
The harness alternates launching aider with GPT-4o and Opus to solve the problem,
and gives each model three attempts -- for a total of six attempts.
As soon as a plausible solution is found, it is accepted and the
harness moves on to the next SWE Bench instance.
It's worth noting that repositories may have lint or test errors
present before aider even starts to edit them.
Whether unresolved errors were caused by aider or were pre-existing,
there will be instances where
no plausible solution is
found after six tries.
If all six attempts fail to produce a plausible solution,
then the "best" solution available is selected as the
`model_patch`.
Which of the non-plausible solutions to use is determined
by ignoring the testing outcome
and prioritizing solutions in the following order:
- Pick a solution where editing and linting were completed successfully.
- Pick a solution where editing was at least partially successful and linting succeeded.
- Pick a solution where editing was successful.
- Pick a solution where editing was at least partially successful.
## Computing the benchmark score
The benchmark harness produced a plausible solution for each of the 300
SWE Bench Lite instances and saved it as the `model_patch`.
A separate evaluation script was used to
test each of these solutions with the full test suite,
including the held out acceptance tests.
For this final acceptance testing, any edits that aider made to tests
are discarded.
This ensures that the correct,
unmodified test suite is used for acceptance testing.
The evaluation script compares the test results
with results from testing
the "gold" patch that was developed by a human to correctly solve the issue.
If they match, the candidate solution has correctly resolved the issue.
These acceptance tests are only ever run outside of aider
and the benchmark harness, and only to compute the number of
correctly resolved instances.
They are never run, used, or even visible during aider's attempts to solve the problems.
Aider correctly resolved 79 out of 300 SWE Bench Lite instances, or 26.3%.
## Acknowledgments
Much thanks to the team behind the
[SWE Bench](https://www.swebench.com)
family of AI coding benchmarks.
Also thanks to Albert Örwall who has
[dockerized the SWE Bench evaluation scripts](https://github.com/aorwall/SWE-bench-docker)
making it faster, easier, and more reliable to run the acceptance tests.
## References
All of aider's results reported here are pass@1 results,
obtained without using the SWE Bench `hints_text`.
The "aider agent" internally makes multiple "attempts" at solving the problem,
but it picks and returns one single candidate solution.
Only that one candidate solution is evaluated with the acceptance tests
and contributes to the benchmark score.
Thus it is a pass@1 result.
This is contrast to a pass@N result for N>1, where N attempts are made
and all N solutions are evaluated by the acceptance tests.
If *any* of the N solution pass, that counts as a pass@N success.
Below are the references for the other pass@1 unhinted SWE-Bench results
displayed in the graph at the beginning of this article.
- [20.3% Amazon Q Developer Agent (v20240430-dev)](https://www.swebench.com)
- [19.0% AutoCodeRover](https://www.swebench.com/)
- [18.0% SWE-Agent + GPT-4](https://www.swebench.com)
- [16.7% OpenDevin](https://github.com/OpenDevin/OpenDevin/issues/2149)
- [11.7% SWE-Agent + Opus](https://www.swebench.com)
Note, the graph was corrected on 5/30/24 as follows.
The graph now contains AutoCodeRover's average pass@1 results.
Previously it displayed pass@3 results, which are
not comparable
to the pass@1 results for aider being reported here.
The [AutoCodeRover GitHub page](https://github.com/nus-apr/auto-code-rover)
features pass@3 results
without being clearly labeled.
The graph now contains the best OpenDevin results obtained without using
the SWE Bench `hints_text` to provide hints to the agent.
The previous graph contained their hinted result,
which is not comparable
to the unhinted aider results being reported here.
[OpenDevin reported hinted results](https://x.com/gneubig/status/1791498953709752405)
without noting that hints were used.

View file

@ -0,0 +1,70 @@
---
title: Aider has written 7% of its own code
excerpt: Aider has written 7% of its own code, via 600+ commits that inserted 4.8K and deleted 1.5K lines of code.
highlight_image: /assets/self-assembly.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Aider has written 7% of its own code
[![self assembly](/assets/self-assembly.jpg)](https://aider.chat/assets/self-assembly.jpg)
The
[aider git repo](https://github.com/paul-gauthier/aider)
currently contains about 4K commits and 14K lines of code.
Aider made 15% of the commits, inserting 4.8K and deleting 1.5K lines of code.
About 7% of the code now in the repo is attributable to an aider commit
using `git blame`.
This number is probably a significant undercount, because periodic reformatting
by `black` is likely obscuring aider's authorship of many lines.
Here's the breakdown of the code aider wrote in the current code base
according to `git blame`.
| File | Lines | Percent |
|---|---:|---:|
|aider/args.py| 6 of 449 | 1.3% |
|aider/coders/base_coder.py| 37 of 1354 | 2.7% |
|aider/coders/editblock_coder.py| 14 of 507 | 2.8% |
|aider/coders/editblock_func_coder.py| 6 of 141 | 4.3% |
|aider/coders/udiff_coder.py| 2 of 421 | 0.5% |
|aider/coders/wholefile_coder.py| 5 of 146 | 3.4% |
|aider/coders/wholefile_func_coder.py| 4 of 134 | 3.0% |
|aider/commands.py| 67 of 703 | 9.5% |
|aider/diffs.py| 15 of 129 | 11.6% |
|aider/gui.py| 2 of 533 | 0.4% |
|aider/history.py| 19 of 124 | 15.3% |
|aider/io.py| 55 of 368 | 14.9% |
|aider/linter.py| 30 of 240 | 12.5% |
|aider/main.py| 30 of 466 | 6.4% |
|aider/mdstream.py| 3 of 122 | 2.5% |
|aider/models.py| 22 of 549 | 4.0% |
|aider/repo.py| 19 of 266 | 7.1% |
|aider/repomap.py| 17 of 518 | 3.3% |
|aider/scrape.py| 12 of 199 | 6.0% |
|aider/versioncheck.py| 10 of 37 | 27.0% |
|aider/voice.py| 9 of 104 | 8.7% |
|benchmark/benchmark.py| 33 of 730 | 4.5% |
|benchmark/over_time.py| 32 of 60 | 53.3% |
|benchmark/swe_bench_lite.py| 40 of 71 | 56.3% |
|scripts/blame.py| 55 of 212 | 25.9% |
|scripts/versionbump.py| 96 of 123 | 78.0% |
|setup.py| 11 of 47 | 23.4% |
|tests/test_coder.py| 48 of 612 | 7.8% |
|tests/test_commands.py| 135 of 588 | 23.0% |
|tests/test_editblock.py| 23 of 403 | 5.7% |
|tests/test_io.py| 30 of 65 | 46.2% |
|tests/test_main.py| 13 of 239 | 5.4% |
|tests/test_models.py| 6 of 28 | 21.4% |
|tests/test_repo.py| 2 of 296 | 0.7% |
|tests/test_repomap.py| 70 of 217 | 32.3% |
|tests/test_udiff.py| 7 of 119 | 5.9% |
|tests/test_wholefile.py| 37 of 321 | 11.5% |
| **Total** | **1022 of 14219** | 7.2% |

View file

@ -0,0 +1,267 @@
---
title: Aider is SOTA for both SWE Bench and SWE Bench Lite
excerpt: Aider sets SOTA for the main SWE Bench, after recently setting SOTA for the Lite version.
highlight_image: /assets/swe_bench.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Aider is SOTA for both SWE Bench and SWE Bench Lite
Aider scored 18.9%
on the main
[SWE Bench benchmark](https://www.swebench.com),
achieving a state-of-the-art result.
The current top leaderboard entry is 13.8%
from Amazon Q Developer Agent.
The best result reported elsewhere seems to be
[13.9% from Devin](https://www.cognition.ai/post/swe-bench-technical-report).
This result on the main SWE Bench builds on
[aider's recent SOTA result on the easier SWE Bench Lite](https://aider.chat/2024/05/22/swe-bench-lite.html).
[![SWE Bench results](/assets/swe_bench.svg)](https://aider.chat/assets/swe_bench.svg)
**All of aider's results reported here are pass@1 results,
obtained without using the SWE Bench `hints_text`.**
Aider was benchmarked on the same
[570 randomly selected SWE Bench problems](https://github.com/CognitionAI/devin-swebench-results/tree/main/output_diffs)
that were used in the
[Devin evaluation](https://www.cognition.ai/post/swe-bench-technical-report).
See the [references](#references)
for more details on the data presented in this chart.
## Interactive, not agentic
Aider achieved this result mainly through its existing features that focus on static
code analysis, reliable LLM code editing, and pragmatic UX for automatically
fixing linting and testing errors.
Aider intentionally has quite limited and narrow "agentic behavior"
to avoid long delays, high token costs
and the need for users to repeatedly code review incorrect solutions.
It's also worth noting that aider currently does not use
RAG, vector search, tools or give the LLM access to search the web
or unilaterally execute code.
Aider is first and foremost an interactive tool for engineers to get real work done in
real code bases using a chat interface.
Aider provides a pair programming UX where users can ask for a change
and see code edits performed in real-time.
Aider can also offer additional help like fixing lint or test errors,
but the user is always in full interactive control.
This allows them to quickly steer misunderstandings back on course and
avoid wasting time and token costs.
## Benchmark methodology
Benchmarking was conducted as follows:
- Aider with GPT-4o was launched in each problem's git repository
with the problem statement
submitted as the opening chat message from "the user".
- After that aider ran as normal, except all of aider's
suggestions were always accepted without user approval.
- A [simple harness](https://github.com/paul-gauthier/aider-swe-bench#the-aider-agent) was used to retry the SWE Bench problem if aider produced code that wasn't *plausibly correct*.
Plausibly correct means that aider reported that it had successfully edited the repo
without causing syntax errors or breaking any *pre-existing* tests.
- If the solution from aider with GPT-4o wasn't plausible, the harness launched aider to try again from scratch using Claude 3 Opus.
- If no plausible solution was found after those two tries, the harness picked the "most plausible" solution with the fewest edit/lint/test problems.
It's important to be clear that
*aider and the benchmark harness
only had access to the pre-existing tests in each problem's repo*.
The held out "acceptance tests" were *only* used
after benchmarking to compute statistics on which problems aider
correctly resolved.
This is the same approach
that was used for
[aider's recent SOTA result on SWE Bench Lite](https://aider.chat/2024/05/22/swe-bench-lite.html).
For the Lite benchmark,
aider alternated between GPT-4o and Opus for up to six total attempts.
To manage the cost of running the main SWE Bench benchmark,
aider was limited to two total attempts:
one with GPT-4o and one with Opus.
For a detailed discussion of the benchmark
methodology, see the
[article about aider's SWE Bench Lite results](https://aider.chat/2024/05/22/swe-bench-lite.html).
Also, the
[aider SWE Bench repository on GitHub](https://github.com/paul-gauthier/aider-swe-bench)
contains the harness and statistics code used for the benchmarks.
The benchmarking process was similar to how a developer might use aider to
resolve a GitHub issue:
- They could launch aider in their repo with the command below, which
tells aider they want to accept every suggestion
and to use pytest to run tests.
- `aider --yes --test-cmd pytest`
- They could start the chat by pasting in the URL or text of a GitHub issue.
Aider will pull in the URL's content and then try and resolve the issue.
- If aider doesn't produce code that lints and tests clean, the user might decide to
[use git to revert the changes](https://aider.chat/docs/git.html),
and try again with `aider --opus`.
## Aider with GPT-4o alone was SOTA
Using aider with GPT-4o to make a single attempt at resolving each problem
achieved a score of 17.0%.
This was itself a state-of-the-art result, before being surpassed by the main
result being reported here
that used aider with both GPT-4o & Opus.
## Aider with GPT-4o & Opus
The benchmark harness started by using aider with GPT-4o to try
and resolve each problem.
For problems where this didn't produce a plausible solution,
the harness tried again using aider with Opus.
So at most, two attempts were made for each problem.
The table below breaks down the proposed solutions that
were found from each attempt at the 570 problems.
A proposed solution is either:
- A plausible solution where
aider reported no outstanding errors from editing, linting and testing.
- Or, the "most plausible" solution generated by either attempt, with the
[fewest outstanding editing, linting or testing errors](https://aider.chat/2024/05/22/swe-bench-lite.html#finding-a-plausible-solution).
The table also provides details on the 108 solutions that were ultimately
verified as correctly resolving their issue.
| Attempt | Agent |Number&nbsp;of<br>proposed<br>solutions|Percent&nbsp;of<br>proposed<br>solutions| Number&nbsp;of<br/>correctly<br>resolved<br>solutions | Percent&nbsp;of<br>correctly<br>resolved<br>solutions | Score&nbsp;on<br>SWE&nbsp;Bench<br>Lite |
|:--------:|------------|---------:|---------:|----:|---:|--:|
| 1 | Aider with GPT-4o | 419 | 73.5% | 87 | 80.6% | 15.3% |
| 2 | Aider with Opus | 151 | 26.5% | 21 | 19.4% | 3.7% |
| **Total** | | **570** | **100%** | **108** | **100%** | **18.9%** |
## Non-plausible but correct solutions?
A solution doesn't actually have to be plausible in order to correctly resolve the issue.
Recall that plausible is simply defined as aider
reporting that it successfully completed all file edits,
repaired and resolved any linting errors
and resolved any test failures.
But there are many reasons why aider might fail to do those things
and yet still produce a solution that will pass
acceptance testing:
- There may have been pre-existing failing tests in the repo,
before aider even started working on the SWE Bench problem.
Aider may not have resolved such issues, and yet they may not be
relevant to the acceptance testing.
The SWE Bench acceptance testing just confirms that tests pass or fail
in the same pattern as the "gold patch" developed by a human to resolve the
problem.
Some tests may fail during acceptance testing,
and that's ok as long as they failed for the gold
patch too.
- There may have been pre-existing linting problems in the repo.
If lingering linting issues affected code paths that are not well tested,
they may not impact acceptance testing.
- Aider may have reported file editing errors because it thought the LLM
specified edits that it wasn't able to successfully apply.
This can only happen when the LLM specified edits in
a way that doesn't comply with the editing instructions in the system prompt.
Given that the LLM isn't complying with the system prompt,
it may have become confused and
asked for redundant or otherwise irrelevant edits.
Such outstanding edit errors might not be fatal for acceptance testing.
- Etc.
Keeping all this in mind, we can understand why
GPT-4o accounts for 15.3% of the benchmark score in the table above,
but benchmarking with just one attempt of aider with GPT-4o scored 17.0%.
When an Opus attempt is allowed after GPT-4o,
it may propose some *incorrect* solutions which
are "more plausible" than some of GPT-4o's non-plausible solutions.
These more plausible, incorrect solutions can
eclipse some of
the earlier non-plausible correct solutions that GPT-4o generated.
This is why GPT-4o's score in the table
showing the combined GPT-4o & Opus results (15.3%)
is lower than the result from just one try using aider with GPT-4o (17.0%).
For these reasons, adding additional attempts is not guaranteed to monotonically
increase the number of resolved problems.
New solutions may resolve some new problems but they may also
eclipse and discard some of the previous non-plausible correct solutions.
Luckily, the net effect of additional attempts
usually increases or at least maintains the
number of resolved solutions.
This was the case for all the attempts made in both this main SWE Bench result and the
earlier Lite result.
## Computing the benchmark score
The benchmark harness produced one proposed solution for each of
the 570 SWE Bench problems.
A separate evaluation script was used to
test each of these solutions with the full test suite,
including the held out acceptance tests.
For this final acceptance testing, any edits that aider made to tests
were discarded.
This ensured that the correct,
unmodified test suite was used for acceptance testing.
The evaluation script compared each proposed solution's test results
with results from testing
the "gold" patch that was developed by a human to correctly resolve the issue.
If they matched, the proposed solution correctly resolved the issue.
These acceptance tests were only ever run outside of aider
and the benchmark harness, and only to compute statistics about the
correctly resolved instances.
They were never run, used, or even visible during aider's attempts to resolve the problems.
Aider correctly resolved 108 out of 570 SWE Bench instances that were benchmarked,
or 18.9%.
## Acknowledgments
Much thanks to the team behind the
[SWE Bench](https://www.swebench.com)
family of AI coding benchmarks.
Also thanks to Albert Örwall who has
[dockerized the SWE Bench evaluation scripts](https://github.com/aorwall/SWE-bench-docker)
making it faster, easier, and more reliable to run the acceptance tests.
## References
All of aider's results reported here are pass@1 results,
obtained without using the SWE Bench `hints_text`.
The "aider agent" internally makes multiple "attempts" at solving the problem,
but it picks and returns one single candidate solution.
Only that one candidate solution is evaluated with the acceptance tests
and contributes to the benchmark score.
Thus it is a pass@1 result.
This is contrast to a pass@N result for N>1, where N attempts are made
and all N solutions are evaluated by the acceptance tests.
If *any* of the N solution pass, that counts as a pass@N success.
Below are the references for the other pass@1 unhinted SWE-Bench results
displayed in the graph at the beginning of this article.
- [13.9% Devin, benchmarked on 570 instances.](https://www.cognition.ai/post/swe-bench-technical-report)
- [13.8% Amazon Q Developer Agent, benchmarked on 2,294 instances.](https://www.swebench.com)
- [12.5% SWE- Agent + GPT-4, benchmarked on 2,294 instances.](https://www.swebench.com)
- [10.6% AutoCode Rover, benchmarked on 2,294 instances.](https://arxiv.org/pdf/2404.05427v2)
- [10.5% SWE- Agent + Opus, benchmarked on 2,294 instances.](https://www.swebench.com)
The graph contains average pass@1 results for AutoCodeRover.
The [AutoCodeRover GitHub page](https://github.com/nus-apr/auto-code-rover)
features their pass@3 results
without being clearly labeled.
Table 2 of their
[paper](https://arxiv.org/pdf/2404.05427v2)
reports an `ACR-avg` result of 10.59% which is an average pass@1 result.

View file

@ -0,0 +1,126 @@
---
title: Sonnet is the opposite of lazy
excerpt: Claude 3.5 Sonnet can easily write more good code than fits in one 4k token API response.
highlight_image: /assets/sonnet-not-lazy.jpg
nav_exclude: true
---
[![sonnet is the opposite of lazy](/assets/sonnet-not-lazy.jpg)](https://aider.chat/assets/sonnet-not-lazy.jpg)
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Sonnet is the opposite of lazy
Claude 3.5 Sonnet represents a step change
in AI coding.
It is incredibly industrious, diligent and hard working.
Unexpectedly,
this presented a challenge:
Sonnet
was often writing so much code that
it was hitting the 4k output token limit,
truncating its coding in mid-stream.
Aider now works
around this 4k limit and allows Sonnet to produce
as much code as it wants.
The result is surprisingly powerful.
Sonnet's score on
[aider's refactoring benchmark](https://aider.chat/docs/leaderboards/#code-refactoring-leaderboard)
jumped from 55.1% up to 64.0%.
This moved Sonnet into second place, ahead of GPT-4o and
behind only Opus.
Users who tested Sonnet with a preview of
[aider's latest release](https://aider.chat/HISTORY.html#aider-v0410)
were thrilled:
- *Works like a charm. It is a monster. It refactors files of any size like it is nothing. The continue trick with Sonnet is truly the holy grail. Aider beats [other tools] hands down. I'm going to cancel both subscriptions.* -- [Emasoft](https://github.com/paul-gauthier/aider/issues/705#issuecomment-2200338971)
- *Thanks heaps for this feature - it's a real game changer. I can be more ambitious when asking Claude for larger features.* -- [cngarrison](https://github.com/paul-gauthier/aider/issues/705#issuecomment-2196026656)
- *Fantastic...! It's such an improvement not being constrained by output token length issues. [I refactored] a single JavaScript file into seven smaller files using a single Aider request.* -- [John Galt](https://discord.com/channels/1131200896827654144/1253492379336441907/1256250487934554143)
## Hitting the 4k token output limit
All LLMs have various token limits, the most familiar being their
context window size.
But they also have a limit on how many tokens they can output
in response to a single request.
Sonnet and the majority of other
models are limited to returning 4k tokens.
Sonnet's amazing work ethic caused it to
regularly hit this 4k output token
limit for a few reasons:
1. Sonnet is capable of outputting a very large amount of correct,
complete new code in one response.
2. Similarly, Sonnet can specify long sequences of edits in one go,
like changing a majority of lines while refactoring a large file.
3. Sonnet tends to quote large chunks of a
file when performing a SEARCH & REPLACE edits.
Beyond token limits, this is very wasteful.
## Good problems
Problems (1) and (2) are "good problems"
in the sense that Sonnet is
able to write more high quality code than any other model!
We just don't want it to be interrupted prematurely
by the 4k output limit.
Aider now allows Sonnet to return code in multiple 4k token
responses.
Aider seamlessly combines them so that Sonnet can return arbitrarily
long responses.
This gets all the upsides of Sonnet's prolific coding skills,
without being constrained by the 4k output token limit.
## Wasting tokens
Problem (3) is more complicated, as Sonnet isn't just
being stopped early -- it's actually wasting a lot
of tokens, time and money.
Faced with a few small changes spread far apart in
a source file,
Sonnet would often prefer to do one giant SEARCH/REPLACE
operation of almost the entire file.
It would be far faster and less expensive to instead
do a few surgical edits.
Aider now prompts Sonnet to discourage these long-winded
SEARCH/REPLACE operations
and promotes much more concise edits.
## Aider with Sonnet
[The latest release of aider](https://aider.chat/HISTORY.html#aider-v0410)
has specialized support for Claude 3.5 Sonnet:
- Aider allows Sonnet to produce as much code as it wants,
by automatically and seamlessly spreading the response
out over a sequence of 4k token API responses.
- Aider carefully prompts Sonnet to be concise when proposing
code edits.
This reduces Sonnet's tendency to waste time, tokens and money
returning large chunks of unchanging code.
- Aider now uses Claude 3.5 Sonnet by default if the `ANTHROPIC_API_KEY` is set in the environment.
See
[aider's install instructions](https://aider.chat/docs/install.html)
for more details, but
you can get started quickly with aider and Sonnet like this:
```
$ pip install aider-chat
$ export ANTHROPIC_API_KEY=<key> # Mac/Linux
$ setx ANTHROPIC_API_KEY <key> # Windows
$ aider
```

View file

@ -0,0 +1,102 @@
.btn {
display: inline-block;
width: auto;
}
.btn + .btn {
margin-top: 0;
margin-left: 0.75rem;
}
.post {
background: #fff;
box-shadow: 0 2px 5px rgba(0,0,0,0.1);
margin-bottom: 2em;
padding: 1em;
border-radius: 4px;
}
.post-date {
color: #777;
font-size: 0.85em;
margin-bottom: 1em;
display: block;
}
.post-highlight {
max-width: 20em; /* Assuming the base font-size is 16px, 12.5em is equivalent to 200px */
margin-right: 1em;
margin-left: 1em;
}
@media (max-width: 768px) {
.post-highlight {
max-width: 30em; /* Larger size on mobile */
}
}
.post-content {
display: flex;
align-items: flex-start;
flex: 1;
}
.chat-transcript {
font-family: 'Courier New', Courier, monospace;
background-color: #000;
color: #0f0;
padding: 1em 1em 1em 1em;
border-radius: 5px;
margin-top: 50px;
}
.chat-transcript blockquote {
padding: 0;
margin-left: 0;
color: #819198;
border-left: none
}
.chat-transcript blockquote>:first-child {
margin-top: 0
}
.chat-transcript blockquote>:last-child {
margin-bottom: 0
}
.chat-transcript li,
.chat-transcript p {
color: #00FFFF;
}
.chat-transcript h1 {
display: none;
}
.chat-transcript h4 {
color: #32FF32;
border-top: 1px solid #32FF32;
padding-top: 10px;
text-transform: none;
}
.chat-transcript h4::before {
content: '> ';
}
.chat-transcript blockquote p {
color: #ffffff;
}
.chat-transcript::before {
content: url('data:image/svg+xml;utf8,<svg xmlns="http://www.w3.org/2000/svg" width="90" height="25"><circle cx="7" cy="15" r="7" fill="%23f92672"/><circle cx="27" cy="15" r="7" fill="%23f4bf75"/><circle cx="47" cy="15" r="7" fill="%23a6e22e"/></svg>');
display: block;
margin-bottom: 0;
position: relative;
top: -0.5em;
}
.chat-transcript,
div.highlighter-rouge pre.highlight, div.highlighter-rouge code {
line-height: 1.1;
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 190 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 390 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 397 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View file

@ -0,0 +1,9 @@
<?xml version="1.0" encoding="utf-8"?>
<browserconfig>
<msapplication>
<tile>
<square150x150logo src="/assets/icons/mstile-150x150.png"/>
<TileColor>#da532c</TileColor>
</tile>
</msapplication>
</browserconfig>

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

View file

@ -0,0 +1,32 @@
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg version="1.0" xmlns="http://www.w3.org/2000/svg"
width="436.000000pt" height="436.000000pt" viewBox="0 0 436.000000 436.000000"
preserveAspectRatio="xMidYMid meet">
<metadata>
Created by potrace 1.14, written by Peter Selinger 2001-2017
</metadata>
<g transform="translate(0.000000,436.000000) scale(0.100000,-0.100000)"
fill="#000000" stroke="none">
<path d="M0 2180 l0 -2180 2180 0 2180 0 0 2180 0 2180 -2180 0 -2180 0 0
-2180z m2705 1818 c20 -20 28 -121 30 -398 l2 -305 216 -5 c118 -3 218 -8 222
-12 3 -3 10 -46 15 -95 5 -48 16 -126 25 -172 17 -86 17 -81 -17 -233 -14 -67
-13 -365 2 -438 21 -100 22 -159 5 -247 -24 -122 -24 -363 1 -458 23 -88 23
-213 1 -330 -9 -49 -17 -109 -17 -132 l0 -43 203 0 c111 0 208 -4 216 -9 10
-6 18 -51 27 -148 8 -76 16 -152 20 -168 7 -39 -23 -361 -37 -387 -10 -18 -21
-19 -214 -16 -135 2 -208 7 -215 14 -22 22 -33 301 -21 501 6 102 8 189 5 194
-8 13 -417 12 -431 -2 -12 -12 -8 -146 8 -261 8 -55 8 -95 1 -140 -6 -35 -14
-99 -17 -143 -9 -123 -14 -141 -41 -154 -18 -8 -217 -11 -679 -11 l-653 0 -11
33 c-31 97 -43 336 -27 533 5 56 6 113 2 128 l-6 26 -194 0 c-211 0 -252 4
-261 28 -12 33 -17 392 -6 522 15 186 -2 174 260 180 115 3 213 8 217 12 4 4
1 52 -5 105 -7 54 -17 130 -22 168 -7 56 -5 91 11 171 10 55 22 130 26 166 4
36 10 72 15 79 7 12 128 15 665 19 l658 5 8 30 c5 18 4 72 -3 130 -12 115 -7
346 11 454 10 61 10 75 -1 82 -8 5 -300 9 -650 9 l-636 0 -27 25 c-18 16 -26
34 -26 57 0 18 -5 87 -10 153 -10 128 5 449 22 472 5 7 26 13 46 15 78 6 1281
3 1287 -4z"/>
<path d="M1360 1833 c0 -5 -1 -164 -3 -356 l-2 -347 625 -1 c704 -1 708 -1
722 7 5 4 7 20 4 38 -29 141 -32 491 -6 595 9 38 8 45 -7 57 -15 11 -139 13
-675 14 -362 0 -658 -3 -658 -7z"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 1.8 KiB

View file

@ -0,0 +1,19 @@
{
"name": "",
"short_name": "",
"icons": [
{
"src": "/assets/icons/android-chrome-192x192.png",
"sizes": "192x192",
"type": "image/png"
},
{
"src": "/assets/icons/android-chrome-384x384.png",
"sizes": "384x384",
"type": "image/png"
}
],
"theme_color": "#ffffff",
"background_color": "#ffffff",
"display": "standalone"
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 139 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 344 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 154 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 671 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 700 KiB

View file

@ -0,0 +1,242 @@
##########################################################
# Sample .aider.conf.yaml
# This file lists *all* the valid configuration entries.
# Place in your home dir, or at the root of your git repo.
##########################################################
##########
# options:
## show this help message and exit
#help:
#######
# Main:
## Specify the OpenAI API key
#openai-api-key:
## Specify the Anthropic API key
#anthropic-api-key:
## Specify the model to use for the main chat
#model:
## Use claude-3-opus-20240229 model for the main chat
#opus: false
## Use claude-3-5-sonnet-20240620 model for the main chat
#sonnet: false
## Use gpt-4-0613 model for the main chat
#4: false
## Use gpt-4o model for the main chat
#4o: false
## Use gpt-4-1106-preview model for the main chat
#4-turbo: false
## Use gpt-3.5-turbo model for the main chat
#35turbo: false
#################
# Model Settings:
## List known models which match the (partial) MODEL name
#models:
## Specify the api base url
#openai-api-base:
## Specify the api_type
#openai-api-type:
## Specify the api_version
#openai-api-version:
## Specify the deployment_id
#openai-api-deployment-id:
## Specify the OpenAI organization ID
#openai-organization-id:
## Specify a file with aider model settings for unknown models
#model-settings-file: .aider.model.settings.yml
## Specify a file with context window and costs for unknown models
#model-metadata-file: .aider.model.metadata.json
## Verify the SSL cert when connecting to models (default: True)
#verify-ssl: true
## Specify what edit format the LLM should use (default depends on model)
#edit-format:
## Specify the model to use for commit messages and chat history summarization (default depends on --model)
#weak-model:
## Only work with models that have meta-data available (default: True)
#show-model-warnings: true
## Max number of tokens to use for repo map, use 0 to disable (default: 1024)
#map-tokens: true
## Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.
#max-chat-history-tokens:
## Specify the .env file to load (default: .env in git root)
#env-file: .env
################
# History Files:
## Specify the chat input history file (default: .aider.input.history)
#input-history-file: .aider.input.history
## Specify the chat history file (default: .aider.chat.history.md)
#chat-history-file: .aider.chat.history.md
## Restore the previous chat history messages (default: False)
#restore-chat-history: false
## Log the conversation with the LLM to this file (for example, .aider.llm.history)
#llm-history-file:
##################
# Output Settings:
## Use colors suitable for a dark terminal background (default: False)
#dark-mode: false
## Use colors suitable for a light terminal background (default: False)
#light-mode: false
## Enable/disable pretty, colorized output (default: True)
#pretty: true
## Enable/disable streaming responses (default: True)
#stream: true
## Set the color for user input (default: #00cc00)
#user-input-color: #00cc00
## Set the color for tool output (default: None)
#tool-output-color:
## Set the color for tool error messages (default: red)
#tool-error-color: #FF2222
## Set the color for assistant output (default: #0088ff)
#assistant-output-color: #0088ff
## Set the markdown code theme (default: default, other options include monokai, solarized-dark, solarized-light)
#code-theme: default
## Show diffs when committing changes (default: False)
#show-diffs: false
###############
# Git Settings:
## Enable/disable looking for a git repo (default: True)
#git: true
## Enable/disable adding .aider* to .gitignore (default: True)
#gitignore: true
## Specify the aider ignore file (default: .aiderignore in git root)
#aiderignore: .aiderignore
## Enable/disable auto commit of LLM changes (default: True)
#auto-commits: true
## Enable/disable commits when repo is found dirty (default: True)
#dirty-commits: true
## Attribute aider code changes in the git author name (default: True)
#attribute-author: true
## Attribute aider commits in the git committer name (default: True)
#attribute-committer: true
## Prefix commit messages with 'aider: ' (default: False)
#attribute-commit-message: false
## Perform a dry run without modifying files (default: False)
#dry-run: false
########################
# Fixing and committing:
## Commit all pending changes with a suitable commit message, then exit
#commit: false
## Lint and fix provided files, or dirty files if none provided
#lint: false
## Specify lint commands to run for different languages, eg: "python: flake8 --select=..." (can be used multiple times)
#lint-cmd:
## Enable/disable automatic linting after changes (default: True)
#auto-lint: true
## Specify command to run tests
#test-cmd:
## Enable/disable automatic testing after changes (default: False)
#auto-test: false
## Run tests and fix problems found
#test: false
#################
# Other Settings:
## Use VI editing mode in the terminal (default: False)
#vim: false
## Specify the language for voice using ISO 639-1 code (default: auto)
#voice-language: en
## Show the version number and exit
#version:
## Check for updates and return status in the exit code
#check-update: false
## Skips checking for the update when the program runs
#skip-check-update: false
## Apply the changes from the given file instead of running the chat (debug)
#apply:
## Always say yes to every confirmation
#yes: false
## Enable verbose output
#verbose: false
## Print the repo map and exit (debug)
#show-repo-map: false
## Print the system prompts and exit (debug)
#show-prompts: false
## Do all startup activities then exit before accepting user input (debug)
#exit: false
## Specify a single message to send the LLM, process reply then exit (disables chat mode)
#message:
## Specify a file containing the message to send the LLM, process reply, then exit (disables chat mode)
#message-file:
## Specify the encoding for input and output (default: utf-8)
#encoding: utf-8
## Specify the config file (default: search for .aider.conf.yml in git root, cwd or home directory)
#config:
## Run aider in your browser
#gui: false

View file

@ -0,0 +1,244 @@
##########################################################
# Sample aider .env file.
# Place at the root of your git repo.
# Or use `aider --env <fname>` to specify.
##########################################################
#################
# LLM parameters:
#
# Include xxx_API_KEY parameters and other params needed for your LLMs.
# See https://aider.chat/docs/llms.html for details.
## OpenAI
#OPENAI_API_KEY=
## Anthropic
#ANTHROPIC_API_KEY=
##...
#######
# Main:
## Specify the OpenAI API key
#OPENAI_API_KEY=
## Specify the Anthropic API key
#ANTHROPIC_API_KEY=
## Specify the model to use for the main chat
#AIDER_MODEL=
## Use claude-3-opus-20240229 model for the main chat
#AIDER_OPUS=
## Use claude-3-5-sonnet-20240620 model for the main chat
#AIDER_SONNET=
## Use gpt-4-0613 model for the main chat
#AIDER_4=
## Use gpt-4o model for the main chat
#AIDER_4O=
## Use gpt-4-1106-preview model for the main chat
#AIDER_4_TURBO=
## Use gpt-3.5-turbo model for the main chat
#AIDER_35TURBO=
#################
# Model Settings:
## List known models which match the (partial) MODEL name
#AIDER_MODELS=
## Specify the api base url
#OPENAI_API_BASE=
## Specify the api_type
#OPENAI_API_TYPE=
## Specify the api_version
#OPENAI_API_VERSION=
## Specify the deployment_id
#OPENAI_API_DEPLOYMENT_ID=
## Specify the OpenAI organization ID
#OPENAI_ORGANIZATION_ID=
## Specify a file with aider model settings for unknown models
#AIDER_MODEL_SETTINGS_FILE=.aider.model.settings.yml
## Specify a file with context window and costs for unknown models
#AIDER_MODEL_METADATA_FILE=.aider.model.metadata.json
## Verify the SSL cert when connecting to models (default: True)
#AIDER_VERIFY_SSL=true
## Specify what edit format the LLM should use (default depends on model)
#AIDER_EDIT_FORMAT=
## Specify the model to use for commit messages and chat history summarization (default depends on --model)
#AIDER_WEAK_MODEL=
## Only work with models that have meta-data available (default: True)
#AIDER_SHOW_MODEL_WARNINGS=true
## Max number of tokens to use for repo map, use 0 to disable (default: 1024)
#AIDER_MAP_TOKENS=true
## Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.
#AIDER_MAX_CHAT_HISTORY_TOKENS=
## Specify the .env file to load (default: .env in git root)
#AIDER_ENV_FILE=.env
################
# History Files:
## Specify the chat input history file (default: .aider.input.history)
#AIDER_INPUT_HISTORY_FILE=.aider.input.history
## Specify the chat history file (default: .aider.chat.history.md)
#AIDER_CHAT_HISTORY_FILE=.aider.chat.history.md
## Restore the previous chat history messages (default: False)
#AIDER_RESTORE_CHAT_HISTORY=false
## Log the conversation with the LLM to this file (for example, .aider.llm.history)
#AIDER_LLM_HISTORY_FILE=
##################
# Output Settings:
## Use colors suitable for a dark terminal background (default: False)
#AIDER_DARK_MODE=false
## Use colors suitable for a light terminal background (default: False)
#AIDER_LIGHT_MODE=false
## Enable/disable pretty, colorized output (default: True)
#AIDER_PRETTY=true
## Enable/disable streaming responses (default: True)
#AIDER_STREAM=true
## Set the color for user input (default: #00cc00)
#AIDER_USER_INPUT_COLOR=#00cc00
## Set the color for tool output (default: None)
#AIDER_TOOL_OUTPUT_COLOR=
## Set the color for tool error messages (default: red)
#AIDER_TOOL_ERROR_COLOR=#FF2222
## Set the color for assistant output (default: #0088ff)
#AIDER_ASSISTANT_OUTPUT_COLOR=#0088ff
## Set the markdown code theme (default: default, other options include monokai, solarized-dark, solarized-light)
#AIDER_CODE_THEME=default
## Show diffs when committing changes (default: False)
#AIDER_SHOW_DIFFS=false
###############
# Git Settings:
## Enable/disable looking for a git repo (default: True)
#AIDER_GIT=true
## Enable/disable adding .aider* to .gitignore (default: True)
#AIDER_GITIGNORE=true
## Specify the aider ignore file (default: .aiderignore in git root)
#AIDER_AIDERIGNORE=.aiderignore
## Enable/disable auto commit of LLM changes (default: True)
#AIDER_AUTO_COMMITS=true
## Enable/disable commits when repo is found dirty (default: True)
#AIDER_DIRTY_COMMITS=true
## Attribute aider code changes in the git author name (default: True)
#AIDER_ATTRIBUTE_AUTHOR=true
## Attribute aider commits in the git committer name (default: True)
#AIDER_ATTRIBUTE_COMMITTER=true
## Prefix commit messages with 'aider: ' (default: False)
#AIDER_ATTRIBUTE_COMMIT_MESSAGE=false
## Perform a dry run without modifying files (default: False)
#AIDER_DRY_RUN=false
########################
# Fixing and committing:
## Commit all pending changes with a suitable commit message, then exit
#AIDER_COMMIT=false
## Lint and fix provided files, or dirty files if none provided
#AIDER_LINT=false
## Specify lint commands to run for different languages, eg: "python: flake8 --select=..." (can be used multiple times)
#AIDER_LINT_CMD=
## Enable/disable automatic linting after changes (default: True)
#AIDER_AUTO_LINT=true
## Specify command to run tests
#AIDER_TEST_CMD=
## Enable/disable automatic testing after changes (default: False)
#AIDER_AUTO_TEST=false
## Run tests and fix problems found
#AIDER_TEST=false
#################
# Other Settings:
## Use VI editing mode in the terminal (default: False)
#AIDER_VIM=false
## Specify the language for voice using ISO 639-1 code (default: auto)
#AIDER_VOICE_LANGUAGE=en
## Check for updates and return status in the exit code
#AIDER_CHECK_UPDATE=false
## Skips checking for the update when the program runs
#AIDER_SKIP_CHECK_UPDATE=false
## Apply the changes from the given file instead of running the chat (debug)
#AIDER_APPLY=
## Always say yes to every confirmation
#AIDER_YES=
## Enable verbose output
#AIDER_VERBOSE=false
## Print the repo map and exit (debug)
#AIDER_SHOW_REPO_MAP=false
## Print the system prompts and exit (debug)
#AIDER_SHOW_PROMPTS=false
## Do all startup activities then exit before accepting user input (debug)
#AIDER_EXIT=false
## Specify a single message to send the LLM, process reply then exit (disables chat mode)
#AIDER_MESSAGE=
## Specify a file containing the message to send the LLM, process reply, then exit (disables chat mode)
#AIDER_MESSAGE_FILE=
## Specify the encoding for input and output (default: utf-8)
#AIDER_ENCODING=utf-8
## Run aider in your browser
#AIDER_GUI=false

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 115 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 209 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

View file

@ -0,0 +1,37 @@
---
title: Aider blog
layout: default
nav_order: 1000
---
<h1>Aider blog</h1>
<div class="posts">
{% for post in site.posts %}
{% unless post.draft %}
<article class="post">
<h2><a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }}</a></h2>
<div class="entry">
<div class="post-content">
{% if post.excerpt %}
{{ post.excerpt }}
{% else %}
{{ post.content | strip_html | truncatewords: 100 }}...
{% endif %}
{% if post.highlight_image %}
<a href="{{ site.baseurl }}{{ post.url }}">
<div class="post-highlight">
<img src="{{ site.baseurl }}{{ post.highlight_image }}" alt="Highlight Image">
</div>
</a>
{% endif %}
</div>
</div>
<p class="post-date">
{{ post.date | date: "%b %-d, %Y" | upcase }}
</p>
</article>
{% endunless %}
{% endfor %}
</div>

View file

@ -0,0 +1,45 @@
---
title: The January GPT-4 Turbo is lazier than the last version
excerpt: The new `gpt-4-0125-preview` model is quantiatively lazier at coding than previous GPT-4 versions, according to a new "laziness" benchmark.
highlight_image: /assets/benchmarks-0125.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# The January GPT-4 Turbo is lazier than the last version
[![benchmark results](/assets/benchmarks-0125.svg)](https://aider.chat/assets/benchmarks-0125.svg)
[OpenAI just released a new version of GPT-4 Turbo](https://openai.com/blog/new-embedding-models-and-api-updates).
This new model is intended to reduce the "laziness" that has been widely observed with the previous `gpt-4-1106-preview` model:
> Today, we are releasing an updated GPT-4 Turbo preview model, gpt-4-0125-preview. This model completes tasks like code generation more thoroughly than the previous preview model and is intended to reduce cases of “laziness” where the model doesnt complete a task.
With that in mind, I've been benchmarking the new model using
aider's existing
[lazy coding benchmark](https://aider.chat/docs/unified-diffs.html).
## Benchmark results
Overall,
the new `gpt-4-0125-preview` model seems lazier
than the November `gpt-4-1106-preview` model:
- It gets worse benchmark scores when using the [unified diffs](https://aider.chat/docs/unified-diffs.html) code editing format.
- Using aider's older SEARCH/REPLACE block editing format, the new January model outperforms the older November model. But it still performs worse than both models using unified diffs.
## Related reports
This is one in a series of reports
that use the aider benchmarking suite to assess and compare the code
editing capabilities of OpenAI's GPT models.
You can review the other reports
for additional information:
- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
- [Code editing benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html).
- [Aider's lazy coding benchmark](https://aider.chat/docs/unified-diffs.html).

View file

@ -0,0 +1,92 @@
---
title: Code editing benchmarks for OpenAI's "1106" models
excerpt: A quantitative comparison of the code editing capabilities of the new GPT-3.5 and GPT-4 versions that were released in Nov 2023.
highlight_image: /assets/benchmarks-1106.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Code editing benchmarks for OpenAI's "1106" models
[![benchmark results](/assets/benchmarks-1106.svg)](https://aider.chat/assets/benchmarks-1106.svg)
[![benchmark results](/assets/benchmarks-speed-1106.svg)](https://aider.chat/assets/benchmarks-speed-1106.svg)
[OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
and there's a lot
of interest about their ability to code compared to the previous versions.
With that in mind, I've been benchmarking the new models.
[Aider](https://github.com/paul-gauthier/aider)
is an open source command line chat tool that lets you work with GPT to edit
code in your local git repo.
To do this, aider needs to be able to reliably recognize when GPT wants to edit
your source code,
determine which files it wants to modify
and accurately apply the changes it's trying to make.
Doing a good job on this "code editing" task requires a good LLM, good prompting and
a good tool driving the interactions with the LLM.
Aider relies on a
[code editing benchmark](https://aider.chat/docs/benchmarks.html)
to quantitatively evaluate
performance
whenever one of these things changes.
For example,
whenever I change aider's prompting or the backend which drives LLM conversations,
I run the benchmark to make sure these changes produce improvements (not regressions).
The benchmark uses aider to try and complete
[133 Exercism Python coding exercises](https://github.com/exercism/python).
For each exercise, Exercism provides a starting python file with stubs for the needed functions,
a natural language description of the problem to solve
and a test suite to evaluate whether the coder has correctly solved the problem.
The benchmark gives aider two tries to complete the task:
1. On the first try, aider gives GPT the stub code file to edit and the natural language instructions that describe the problem. This reflects how you code with aider. You add your source code files to the chat and ask for changes, which are automatically applied.
2. If the test suite fails after the first try, aider gives GPT the test error output and asks it to fix the code. Aider supports this sort of interaction using a command like `/run pytest` to run and share pytest results in the chat with GPT. You can `/run` whatever tests/linters/etc make sense for your language/framework/situation.
## Benchmark results
### gpt-4-1106-preview
For now, I have only benchmarked the GPT-4 models using the `diff` edit method.
This is the edit format that aider uses by default with gpt-4.
- The new `gpt-4-1106-preview` model seems **2-2.5X faster** than the June GPT-4 model.
- **It seems better at producing correct code on the first try**. It gets
53% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
- The new model seems to perform similar
(~65%) to the old models (63-64%) after their second chance to correct bugs by reviewing test suite error output.
### gpt-3.5-turbo-1106
I benchmarked the GPT-3.5 models with both the `whole` and `diff` edit format.
None of the gpt-3.5 models seem able to effectively use the `diff` edit format, including the newest November (1106) model.
The comments below only focus on comparing the `whole` edit format results:
- The new `gpt-3.5-turbo-1106` model is completing the benchmark **3-4X faster** than the earlier GPT-3.5 models.
- The success rate after the first try of 42% is comparable to the previous June (0613) model. The new November and previous June models are both worse than the original March (0301) model's 50% result on the first try.
- The new model's 56% success rate after the second try seems comparable to the original March model, and somewhat better than the June model's 50% score.
## Related reports
This is one in a series of reports
that use the aider benchmarking suite to assess and compare the code
editing capabilities of OpenAI's GPT models.
You can review the other reports
for additional information:
- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
- [Code editing speed benchmarks for OpenAI's "1106" models](https://aider.chat/2023/11/06/benchmarks-speed-1106.html) compares the performance of the new GPT models.
## Updates
Last updated 11/14/23.
OpenAI has relaxed rate limits so these results are no longer considered preliminary.

View file

@ -0,0 +1,59 @@
---
title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
canonical_url: https://aider.chat/2023/11/06/benchmarks-speed-1106.html
highlight_image: /assets/benchmarks-speed-1106.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
<p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>
[![benchmark results](/assets/benchmarks-speed-1106.svg)](https://aider.chat/assets/benchmarks-speed-1106.svg)
[OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
and there's a lot
of interest about their capabilities and performance.
With that in mind, I've been benchmarking the new models.
[Aider](https://github.com/paul-gauthier/aider)
is an open source command line chat tool that lets you work with GPT to edit
code in your local git repo.
Aider relies on a
[code editing benchmark](https://aider.chat/docs/benchmarks.html)
to quantitatively evaluate
performance.
This is the latest in a series of reports
that use the aider benchmarking suite to assess and compare the code
editing capabilities of OpenAI's GPT models. You can review previous
reports to get more background on aider's benchmark suite:
- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
- [Code editing skill benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html) compares the olders models to the November (1106) models.
## Speed
This report compares the **speed** of the various GPT models.
Aider's benchmark measures the response time of the OpenAI chat completion
endpoint each time it asks GPT to solve a programming exercise in the benchmark
suite. These results measure only the time spent waiting for OpenAI to
respond to the prompt.
So they are measuring
how fast these models can
generate responses which primarily consist of source code.
Some observations:
- **GPT-3.5 got 6-11x faster.** The `gpt-3.5-turbo-1106` model is 6-11x faster than the June (0613) version which has been the default `gpt-3.5-turbo` model.
- **GPT-4 Turbo is 2-2.5x faster.** The new `gpt-4-1106-preview` model is 2-2.5x faster than the June (0613) version which has been the default `gpt-4` model.
- The old March (0301) version of GPT-3.5 is actually faster than the June (0613) version. This was a surprising discovery.
## Updates
Last updated 11/14/23.
OpenAI has relaxed rate limits so these results are no longer considered preliminary.

View file

@ -0,0 +1,368 @@
---
title: GPT code editing benchmarks
excerpt: Benchmarking GPT-3.5 and GPT-4 code editing skill using a new code editing benchmark suite based on the Exercism python exercises.
highlight_image: /assets/benchmarks.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# GPT code editing benchmarks
[![benchmark results](/assets/benchmarks.svg)](https://aider.chat/assets/benchmarks.svg)
Aider is an open source command line chat tool that lets you work with GPT to edit
code in your local git repo.
To do this, aider needs to be able to reliably recognize when GPT wants to edit local files,
determine which files it wants to modify and what changes to save.
Such automated
code editing hinges on using the system prompt
to tell GPT how to structure code edits in its responses.
Aider currently asks GPT to use simple text based "edit formats", but
[OpenAI's new function calling
API](https://openai.com/blog/function-calling-and-other-api-updates)
looks like a promising way to create more structured edit formats.
After implementing a couple of function based edit formats,
I wanted
to measure the potential benefits
of switching aider to use them by default.
With this in mind, I developed a
benchmark based on the [Exercism
python](https://github.com/exercism/python) coding exercises.
This
benchmark evaluates how effectively aider and GPT can translate a
natural language coding request into executable code saved into
files that pass unit tests.
It provides an end-to-end evaluation of not just
GPT's coding ability, but also its capacity to *edit existing code*
and *format those code edits* so that aider can save the
edits to the local source files.
I ran the benchmark
on all the ChatGPT models (except `gpt-4-32k`), using a variety of edit formats.
The results were interesting:
- **Plain text edit formats worked best.** Asking GPT to return an updated copy of the whole file in a standard markdown fenced code block proved to be the most reliable and effective edit format across all GPT-3.5 and GPT-4 models. The results for this `whole` edit format are shown in solid blue in the graph.
- **Function calls performed worse.** Using the new functions API for edits performed worse than the above whole file method, for all the models. GPT-3.5 especially produced inferior code and frequently mangled this output format. This was surprising, as the functions API was introduced to enhance the reliability of structured outputs. The results for these `...-func` edit methods are shown as patterned bars in the graph (both green and blue).
- **The new June GPT-3.5 models did a bit worse than the old June model.** The performance of the new June (`0613`) versions of GPT-3.5 appears to be a bit worse than the February (`0301`) version. This is visible if you look at the "first attempt" markers on the first three solid blue bars and also by comparing the first three solid green `diff` bars.
- **GPT-4 does better than GPT-3.5,** as expected.
The quantitative benchmark results agree with my intuitions
about prompting GPT for complex tasks like coding. It's beneficial to
minimize the "cognitive overhead" of formatting the response, allowing
GPT to concentrate on the coding task at hand.
As a thought experiment, imagine a slack conversation with a junior developer where
you ask them to write the code to add some new feature to your app.
They're going to type the response back to you by hand in the chat.
Should they type out the
code and wrap it in a normal markdown code block?
Or should they type up a properly escaped and
syntactically correct json data structure
that contains the text of the new code?
Using more complex output formats with GPT seems to cause two issues:
- It makes GPT write worse code. Keeping the output format simple seems to allow GPT to devote more attention to the actual coding task.
- It reduces GPT's adherence to the output format, making it more challenging for tools like aider to accurately identify and apply the edits GPT is attempting to make.
I was expecting to start using function call based edits in aider for both GPT-3.5 and GPT-4.
But given these benchmark results, I won't be adopting the functions API
at this time.
I will certainly plan to benchmark functions again with future versions of the models.
More details on the benchmark, edit formats and results are discussed below.
## The benchmark
The benchmark uses
[133 practice exercises from the Exercism python repository](https://github.com/exercism/python/tree/main/exercises/practice).
These
exercises were designed to help individuals learn Python and hone
their coding skills.
Each exercise includes:
- [Instructions](https://github.com/exercism/python/blob/main/exercises/practice/anagram/.docs/instructions.md), provided in markdown files.
- [Stub python code](https://github.com/exercism/python/blob/main/exercises/practice/anagram/anagram.py) in an *implementation file*, specifying the functions or classes that need to be implemented.
- [Unit tests](https://github.com/exercism/python/blob/main/exercises/practice/anagram/anagram_test.py) in a separate python file.
The goal is for GPT to read the instructions, implement the provided function/class skeletons
and pass all the unit tests. The benchmark measures what percentage of
the 133 exercises are completed successfully, causing all the associated unit tests to pass.
To start each exercise, aider sends GPT
the initial contents of the implementation file,
the Exercism instructions
and a final instruction:
```
Use the above instructions to modify the supplied files: <implementation file>
Keep and implement the existing function or class stubs, they will be called from unit tests.
Only use standard python libraries, don't suggest installing any packages.
```
Aider updates the implementation file based on GPT's reply and runs
the unit tests. If all tests pass, the exercise is considered
complete. If some tests fail, aider sends GPT a second message with
the test error output. It only sends the first 50 lines of test errors
to try and avoid exceeding the context window of the smaller models. Aider
also includes this final instruction:
```
See the testing errors above.
The tests are correct.
Fix the code in <implementation file> to resolve the errors.
```
Requiring GPT to fix its first implementation in response to test failures
is another way in which this benchmark stresses code editing skill.
This second chance is also important because it
gives GPT the opportunity to adjust if the
instructions were imprecise with respect to the
specific requirements of the unit tests.
Many of the exercises have multiple paragraphs of instructions,
and most human coders would likely fail some tests on their
first try.
The bars in the graph show the percent of exercises that were completed by
each model and edit format combination. The full bar height represents
the final outcome following both coding attempts.
Each bar also has a horizontal mark that shows
the intermediate performance after the first coding attempt,
without the benefit of the second try that includes the test error output.
It's worth noting that GPT never gets to see the source code of the
unit tests during the benchmark. It only sees the error output from
failed tests. Of course, all of this code was probably part of its
original training data!
In summary, passing an exercise means GPT was able to:
- Write the required code (possibly after reviewing test error output),
- Correctly package all of the code edits into the edit format so that aider can process and save it to the implementation file.
Conversely, failing an exercise only requires a breakdown in one of
those steps. In practice, GPT fails at different steps in different
exercises. Sometimes it simply writes the wrong code. Other times, it
fails to format the code edits in a way that conforms to the edit
format, resulting in the code not being saved correctly.
It's worth keeping in mind that changing the edit format often affects
both aspects of GPT's performance.
Complex edit formats often lead GPT to write worse code *and* make it less
successful at formatting the edits correctly.
## Edit formats
I benchmarked 4 different edit formats, described below.
Each description includes a sample response that GPT might provide to a user who
requests:
"Change the print from hello to goodbye."
### whole
The
[whole](https://github.com/paul-gauthier/aider/blob/main/aider/coders/wholefile_prompts.py)
format asks GPT to return an updated copy of the entire file, including any changes.
The file should be
formatted with normal markdown triple-backtick fences, inlined with the rest of its response text.
This format is very similar to how ChatGPT returns code snippets during normal chats, except with the addition of a filename right before the opening triple-backticks.
````
Here is the updated copy of your file demo.py:
demo.py
```python
def main():
print("goodbye")
```
````
### diff
The [diff](https://github.com/paul-gauthier/aider/blob/main/aider/coders/editblock_prompts.py)
format also asks GPT to return edits as part of the normal response text,
in a simple diff format.
Each edit is a fenced code block that
specifies the filename and a chunk of ORIGINAL and UPDATED code.
GPT provides some original lines from the file and then a new updated set of lines.
````
Here are the changes you requested to demo.py:
```python
demo.py
<<<<<<< ORIGINAL
print("hello")
=======
print("goodbye")
>>>>>>> UPDATED
```
````
### whole-func
The [whole-func](https://github.com/paul-gauthier/aider/blob/main/aider/coders/wholefile_func_coder.py)
format requests updated copies of whole files to be returned using the function call API.
```
{
"explanation": "Changed hello to goodbye.",
"files": [
{
"path": "demo.py",
"content": "def main():\n print(\"goodbye\")\n"
}
}
```
### diff-func
The
[diff-func](https://github.com/paul-gauthier/aider/blob/main/aider/coders/editblock_func_coder.py)
format requests a list of
original/updated style edits to be returned using the function call API.
```
{
"explanation": "Changed hello to goodbye.",
"edits": [
{
"path": "demo.py",
"original_lines": [
" print(\"hello\")"
],
"updated_lines": [
" print(\"goodbye\")"
],
}
]
}
```
## GPT-3.5's performance
### The `0613` models seem worse?
The GPT-3.5 benchmark results have me fairly convinced that the new
`gpt-3.5-turbo-0613` and `gpt-3.5-16k-0613` models
are a bit worse at code editing than
the older `gpt-3.5-turbo-0301` model.
This is visible in the "first attempt"
portion of each result, before GPT gets a second chance to edit the code.
Look at the horizontal white line in the middle of the first three blue bars.
Performance with the `whole` edit format was 46% for the
February model and only 39% for the June models.
But also note how much the solid green `diff` bars
degrade between the February and June GPT-3.5 models.
They drop from 30% down to about 19%.
I saw other signs of this degraded performance
in earlier versions of the
benchmark as well.
### Pathological use of `diff`
When GPT-3.5 is able to correctly generate the `diff` edit format,
it often uses it in a pathological manner. It places the *entire*
original source file in the ORIGINAL block and the entire updated file
in the UPDATED block. This is strictly worse than just using the
`whole` edit format, as GPT is sending two full copies of the file.
### Hallucinated function calls
When GPT-3.5 uses the functions API
it is prone to ignoring the JSON Schema that specifies valid functions.
It often returns a completely novel and semantically
invalid `function_call` fragment with `"name": "python"`.
The `arguments` attribute is supposed to be a set of key/value pairs
with the arguments to the function specified in the `name` field.
Instead, GPT-3.5 frequently just stuffs an entire python
file into that field.
```
"function_call": {
"name": "python",
"arguments": "def main():\n print(\"hello\")\n"
},
```
It seems like it might be getting confused by fine-tuning that was
done for the ChatGPT code interpreter plugin?
## Randomness
The benchmark attempts to be deterministic, always sending identical
requests for each exercise on repeated runs.
As part of this effort,
when sending test error output to GPT,
it removes the wall-clock timing information that
is normally included by the `unittest` module.
The benchmark harness also logs SHA hashes of
all the OpenAI API requests and replies.
This makes it possible to
detect randomness or nondeterminism
in the benchmarking process.
It turns out that the OpenAI chat APIs are not deterministic, even at
`temperature=0`. The same identical request will produce multiple
distinct responses, usually less than 5-10 variations. This suggests
that OpenAI may be load balancing their API across a number of
slightly different instances of the model?
For certain exercises, some of these variable responses pass the unit tests while
other variants do not. Results for exercises like this, which are
"on the bubble",
are therefore a bit random, depending on which variant OpenAI returns.
Given that, it would be ideal to run all 133 exercises many times for each
model/edit-format combination and report an average performance.
This would average away the effect of the API variance.
It would also significantly increase the cost of this sort of benchmarking.
So I didn't do that.
Benchmarking against 133 exercises already provides some robustness, since
we are measuring the performance across many exercises.
But to get a sense of how much the API variance impacts the benchmark outcomes,
I ran all 133 exercises 10 times each
against `gpt-3.5-turbo-0613` with the `whole` edit format.
You'll see one set of error bars in the graph, which show
the range of results from those 10 runs.
The OpenAI API randomness doesn't seem to
cause a large variance in the overall benchmark results.
## Conclusions
Based on these benchmark results, aider will continue to use
the `whole` edit format for GPT-3.5, and `diff` for GPT-4.
GPT-4 gets comparable results with the `whole` and `diff` edit formats,
but using `whole` significantly increases costs and latency compared to `diff`.
The latency of streaming back the entire updated copy of each edited file
is a real challenge with the `whole` format.
The GPT-3.5 models are quite responsive, and can
stream back entire files at reasonable speed.
Aider displays a progress bar and
live diffs of the files as they stream in,
which helps pass the time.
The GPT-4 models are much slower, and waiting for even small files
to be completely "retyped" on each request is probably unacceptable.

View file

@ -0,0 +1,57 @@
---
title: Aider in your browser
highlight_image: /assets/browser.jpg
parent: Usage
nav_order: 800
description: Aider can run in your browser, not just on the command line.
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Aider in your browser
<div class="video-container">
<video controls loop poster="/assets/browser.jpg">
<source src="/assets/aider-browser-social.mp4" type="video/mp4">
<a href="/assets/aider-browser-social.mp4">Aider browser UI demo video</a>
</video>
</div>
<style>
.video-container {
position: relative;
padding-bottom: 101.89%; /* 1080 / 1060 = 1.0189 */
height: 0;
overflow: hidden;
}
.video-container video {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
}
</style>
Use aider's new experimental browser UI to collaborate with LLMs
to edit code in your local git repo.
Aider will directly edit the code in your local source files,
and [git commit the changes](https://aider.chat/docs/git.html)
with sensible commit messages.
You can start a new project or work with an existing git repo.
Aider works well with GPT 3.5, GPT-4, GPT-4 Turbo with Vision,
and Claude 3 Opus.
It also supports [connecting to almost any LLM](https://aider.chat/docs/llms.html).
Use the `--browser` switch to launch the browser version of aider:
```
pip install aider-chat
export OPENAI_API_KEY=<key> # Mac/Linux
setx OPENAI_API_KEY <key> # Windows
aider --browser
```

View file

@ -0,0 +1,79 @@
---
parent: Usage
nav_order: 50
description: Control aider with in-chat commands like /add, /model, etc.
---
# In-chat commands
Aider supports commands from within the chat, which all start with `/`.
<!--[[[cog
from aider.commands import get_help_md
cog.out(get_help_md())
]]]-->
- **/add** Add files to the chat so GPT can edit them or review them in detail
- **/clear** Clear the chat history
- **/commit** Commit edits to the repo made outside the chat (commit message optional)
- **/diff** Display the diff of the last aider commit
- **/drop** Remove files from the chat session to free up context space
- **/exit** Exit the application
- **/git** Run a git command
- **/help** Show help about all commands
- **/lint** Lint and fix provided files or in-chat files if none provided
- **/ls** List all known files and indicate which are included in the chat session
- **/model** Switch to a new LLM
- **/models** Search the list of available models
- **/quit** Exit the application
- **/run** Run a shell command and optionally add the output to the chat (alias: !)
- **/test** Run a shell command and add the output to the chat on non-zero exit code
- **/tokens** Report on the number of tokens used by the current chat context
- **/undo** Undo the last git commit if it was done by aider
- **/voice** Record and transcribe voice input
- **/web** Use headless selenium to scrape a webpage and add the content to the chat
<!--[[[end]]]-->
# Entering multi-line chat messages
{% include multi-line.md %}
# Keybindings
The interactive prompt is built with [prompt-toolkit](https://github.com/prompt-toolkit/python-prompt-toolkit) which provides emacs and vi keybindings.
## Emacs
- `Ctrl-A` : Move cursor to the start of the line.
- `Ctrl-B` : Move cursor back one character.
- `Ctrl-D` : Delete the character under the cursor.
- `Ctrl-E` : Move cursor to the end of the line.
- `Ctrl-F` : Move cursor forward one character.
- `Ctrl-K` : Delete from the cursor to the end of the line.
- `Ctrl-L` : Clear the screen.
- `Ctrl-N` : Move down to the next history entry.
- `Ctrl-P` : Move up to the previous history entry.
- `Ctrl-R` : Reverse search in command history.
## Vi
To use vi/vim keybindings, run aider with the `--vim` switch.
- `Esc` : Switch to command mode.
- `i` : Switch to insert mode.
- `a` : Move cursor one character to the right and switch to insert mode.
- `A` : Move cursor to the end of the line and switch to insert mode.
- `I` : Move cursor to the beginning of the line and switch to insert mode.
- `h` : Move cursor one character to the left.
- `j` : Move cursor down one line.
- `k` : Move cursor up one line.
- `l` : Move cursor one character to the right.
- `w` : Move cursor forward one word.
- `b` : Move cursor backward one word.
- `0` : Move cursor to the beginning of the line.
- `$` : Move cursor to the end of the line.
- `x` : Delete the character under the cursor.
- `dd` : Delete the current line.
- `u` : Undo the last change.
- `Ctrl-R` : Redo the last undone change.

View file

@ -0,0 +1,41 @@
---
nav_order: 55
has_children: true
description: Information on all of aider's settings and how to use them.
---
# Configuration
Aider has many options which can be set with
command line switches.
Most options can also be set in an `.aider.conf.yml` file
which can be placed in your home directory or at the root of
your git repo.
Or by setting environment variables like `AIDER_xxx`
either in your shell or a `.env` file.
Here are 4 equivalent ways of setting an option.
With a command line switch:
```
$ aider --dark-mode
```
Using a `.aider.conf.yml` file:
```yaml
dark-mode: true
```
By setting an environgment variable:
```
export AIDER_DARK_MODE=true
```
Using an `.env` file:
```
AIDER_DARK_MODE=true
```

View file

@ -0,0 +1,86 @@
---
parent: Configuration
nav_order: 950
description: Configuring advanced settings for LLMs.
---
# Advanced model settings
## Context window size and token costs
In most cases, you can safely ignore aider's warning about unknown context
window size and model costs.
But, you can register context window limits and costs for models that aren't known
to aider. Create a `.aider.model.metadata.json` file in one of these locations:
- Your home directory.
- The root if your git repo.
- The current directory where you launch aider.
- Or specify a specific file with the `--model-metadata-file <filename>` switch.
If the files above exist, they will be loaded in that order.
Files loaded last will take priority.
The json file should be a dictionary with an entry for each model, as follows:
```
{
"deepseek-chat": {
"max_tokens": 4096,
"max_input_tokens": 32000,
"max_output_tokens": 4096,
"input_cost_per_token": 0.00000014,
"output_cost_per_token": 0.00000028,
"litellm_provider": "deepseek",
"mode": "chat"
}
}
```
See
[litellm's model_prices_and_context_window.json file](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json) for more examples.
## Model settings
Aider has a number of settings that control how it works with
different models.
These model settings are pre-configured for most popular models.
But it can sometimes be helpful to override them or add settings for
a model that aider doesn't know about.
To do that,
create a `.aider.model.settings.yml` file in one of these locations:
- Your home directory.
- The root if your git repo.
- The current directory where you launch aider.
- Or specify a specific file with the `--model-settings-file <filename>` switch.
If the files above exist, they will be loaded in that order.
Files loaded last will take priority.
The yaml file should be a a list of dictionary objects for each model, as follows:
```
- name: "gpt-3.5-turbo"
edit_format: "whole"
weak_model_name: "gpt-3.5-turbo"
use_repo_map: false
send_undo_reply: false
accepts_images: false
lazy: false
reminder_as_sys_msg: true
examples_as_sys_msg: false
- name: "gpt-4-turbo-2024-04-09"
edit_format: "udiff"
weak_model_name: "gpt-3.5-turbo"
use_repo_map: true
send_undo_reply: true
accepts_images: true
lazy: true
reminder_as_sys_msg: true
examples_as_sys_msg: false
```

View file

@ -0,0 +1,272 @@
---
parent: Configuration
nav_order: 15
description: How to configure aider with a yaml config file.
---
# YAML config file
Most of aider's options can be set in an `.aider.conf.yml` file,
which can be placed in your home directory or at the root of
your git repo.
{% include special-keys.md %}
Below is a sample of the file, which you
can also
[download from GitHub](https://github.com/paul-gauthier/aider/blob/main/website/assets/sample.aider.conf.yml).
<!--[[[cog
from aider.args import get_sample_yaml
from pathlib import Path
text=get_sample_yaml()
Path("website/assets/sample.aider.conf.yml").write_text(text)
cog.outl("```")
cog.out(text)
cog.outl("```")
]]]-->
```
##########################################################
# Sample .aider.conf.yaml
# This file lists *all* the valid configuration entries.
# Place in your home dir, or at the root of your git repo.
##########################################################
##########
# options:
## show this help message and exit
#help:
#######
# Main:
## Specify the OpenAI API key
#openai-api-key:
## Specify the Anthropic API key
#anthropic-api-key:
## Specify the model to use for the main chat
#model:
## Use claude-3-opus-20240229 model for the main chat
#opus: false
## Use claude-3-5-sonnet-20240620 model for the main chat
#sonnet: false
## Use gpt-4-0613 model for the main chat
#4: false
## Use gpt-4o model for the main chat
#4o: false
## Use gpt-4-1106-preview model for the main chat
#4-turbo: false
## Use gpt-3.5-turbo model for the main chat
#35turbo: false
#################
# Model Settings:
## List known models which match the (partial) MODEL name
#models:
## Specify the api base url
#openai-api-base:
## Specify the api_type
#openai-api-type:
## Specify the api_version
#openai-api-version:
## Specify the deployment_id
#openai-api-deployment-id:
## Specify the OpenAI organization ID
#openai-organization-id:
## Specify a file with aider model settings for unknown models
#model-settings-file: .aider.model.settings.yml
## Specify a file with context window and costs for unknown models
#model-metadata-file: .aider.model.metadata.json
## Verify the SSL cert when connecting to models (default: True)
#verify-ssl: true
## Specify what edit format the LLM should use (default depends on model)
#edit-format:
## Specify the model to use for commit messages and chat history summarization (default depends on --model)
#weak-model:
## Only work with models that have meta-data available (default: True)
#show-model-warnings: true
## Max number of tokens to use for repo map, use 0 to disable (default: 1024)
#map-tokens: true
## Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.
#max-chat-history-tokens:
## Specify the .env file to load (default: .env in git root)
#env-file: .env
################
# History Files:
## Specify the chat input history file (default: .aider.input.history)
#input-history-file: .aider.input.history
## Specify the chat history file (default: .aider.chat.history.md)
#chat-history-file: .aider.chat.history.md
## Restore the previous chat history messages (default: False)
#restore-chat-history: false
## Log the conversation with the LLM to this file (for example, .aider.llm.history)
#llm-history-file:
##################
# Output Settings:
## Use colors suitable for a dark terminal background (default: False)
#dark-mode: false
## Use colors suitable for a light terminal background (default: False)
#light-mode: false
## Enable/disable pretty, colorized output (default: True)
#pretty: true
## Enable/disable streaming responses (default: True)
#stream: true
## Set the color for user input (default: #00cc00)
#user-input-color: #00cc00
## Set the color for tool output (default: None)
#tool-output-color:
## Set the color for tool error messages (default: red)
#tool-error-color: #FF2222
## Set the color for assistant output (default: #0088ff)
#assistant-output-color: #0088ff
## Set the markdown code theme (default: default, other options include monokai, solarized-dark, solarized-light)
#code-theme: default
## Show diffs when committing changes (default: False)
#show-diffs: false
###############
# Git Settings:
## Enable/disable looking for a git repo (default: True)
#git: true
## Enable/disable adding .aider* to .gitignore (default: True)
#gitignore: true
## Specify the aider ignore file (default: .aiderignore in git root)
#aiderignore: .aiderignore
## Enable/disable auto commit of LLM changes (default: True)
#auto-commits: true
## Enable/disable commits when repo is found dirty (default: True)
#dirty-commits: true
## Attribute aider code changes in the git author name (default: True)
#attribute-author: true
## Attribute aider commits in the git committer name (default: True)
#attribute-committer: true
## Prefix commit messages with 'aider: ' (default: False)
#attribute-commit-message: false
## Perform a dry run without modifying files (default: False)
#dry-run: false
########################
# Fixing and committing:
## Commit all pending changes with a suitable commit message, then exit
#commit: false
## Lint and fix provided files, or dirty files if none provided
#lint: false
## Specify lint commands to run for different languages, eg: "python: flake8 --select=..." (can be used multiple times)
#lint-cmd:
## Enable/disable automatic linting after changes (default: True)
#auto-lint: true
## Specify command to run tests
#test-cmd:
## Enable/disable automatic testing after changes (default: False)
#auto-test: false
## Run tests and fix problems found
#test: false
#################
# Other Settings:
## Use VI editing mode in the terminal (default: False)
#vim: false
## Specify the language for voice using ISO 639-1 code (default: auto)
#voice-language: en
## Show the version number and exit
#version:
## Check for updates and return status in the exit code
#check-update: false
## Skips checking for the update when the program runs
#skip-check-update: false
## Apply the changes from the given file instead of running the chat (debug)
#apply:
## Always say yes to every confirmation
#yes: false
## Enable verbose output
#verbose: false
## Print the repo map and exit (debug)
#show-repo-map: false
## Print the system prompts and exit (debug)
#show-prompts: false
## Do all startup activities then exit before accepting user input (debug)
#exit: false
## Specify a single message to send the LLM, process reply then exit (disables chat mode)
#message:
## Specify a file containing the message to send the LLM, process reply, then exit (disables chat mode)
#message-file:
## Specify the encoding for input and output (default: utf-8)
#encoding: utf-8
## Specify the config file (default: search for .aider.conf.yml in git root, cwd or home directory)
#config:
## Run aider in your browser
#gui: false
```
<!--[[[end]]]-->

View file

@ -0,0 +1,281 @@
---
parent: Configuration
nav_order: 900
description: Using a .env file to store LLM API keys for aider.
---
# Config with .env
You can use a `.env` file to store API keys and other settings for the
models you use with aider.
You can also set many general aider options
in the `.env` file.
{% include special-keys.md %}
Aider will look for a `.env` file in the
root of your git repo or in the current directory.
You can give it an explicit file to load with the `--env-file <filename>` parameter.
Below is a sample `.env` file, which you
can also
[download from GitHub](https://github.com/paul-gauthier/aider/blob/main/website/assets/sample.env).
<!--[[[cog
from aider.args import get_sample_dotenv
from pathlib import Path
text=get_sample_dotenv()
Path("website/assets/sample.env").write_text(text)
cog.outl("```")
cog.out(text)
cog.outl("```")
]]]-->
```
##########################################################
# Sample aider .env file.
# Place at the root of your git repo.
# Or use `aider --env <fname>` to specify.
##########################################################
#################
# LLM parameters:
#
# Include xxx_API_KEY parameters and other params needed for your LLMs.
# See https://aider.chat/docs/llms.html for details.
## OpenAI
#OPENAI_API_KEY=
## Anthropic
#ANTHROPIC_API_KEY=
##...
#######
# Main:
## Specify the OpenAI API key
#OPENAI_API_KEY=
## Specify the Anthropic API key
#ANTHROPIC_API_KEY=
## Specify the model to use for the main chat
#AIDER_MODEL=
## Use claude-3-opus-20240229 model for the main chat
#AIDER_OPUS=
## Use claude-3-5-sonnet-20240620 model for the main chat
#AIDER_SONNET=
## Use gpt-4-0613 model for the main chat
#AIDER_4=
## Use gpt-4o model for the main chat
#AIDER_4O=
## Use gpt-4-1106-preview model for the main chat
#AIDER_4_TURBO=
## Use gpt-3.5-turbo model for the main chat
#AIDER_35TURBO=
#################
# Model Settings:
## List known models which match the (partial) MODEL name
#AIDER_MODELS=
## Specify the api base url
#OPENAI_API_BASE=
## Specify the api_type
#OPENAI_API_TYPE=
## Specify the api_version
#OPENAI_API_VERSION=
## Specify the deployment_id
#OPENAI_API_DEPLOYMENT_ID=
## Specify the OpenAI organization ID
#OPENAI_ORGANIZATION_ID=
## Specify a file with aider model settings for unknown models
#AIDER_MODEL_SETTINGS_FILE=.aider.model.settings.yml
## Specify a file with context window and costs for unknown models
#AIDER_MODEL_METADATA_FILE=.aider.model.metadata.json
## Verify the SSL cert when connecting to models (default: True)
#AIDER_VERIFY_SSL=true
## Specify what edit format the LLM should use (default depends on model)
#AIDER_EDIT_FORMAT=
## Specify the model to use for commit messages and chat history summarization (default depends on --model)
#AIDER_WEAK_MODEL=
## Only work with models that have meta-data available (default: True)
#AIDER_SHOW_MODEL_WARNINGS=true
## Max number of tokens to use for repo map, use 0 to disable (default: 1024)
#AIDER_MAP_TOKENS=true
## Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.
#AIDER_MAX_CHAT_HISTORY_TOKENS=
## Specify the .env file to load (default: .env in git root)
#AIDER_ENV_FILE=.env
################
# History Files:
## Specify the chat input history file (default: .aider.input.history)
#AIDER_INPUT_HISTORY_FILE=.aider.input.history
## Specify the chat history file (default: .aider.chat.history.md)
#AIDER_CHAT_HISTORY_FILE=.aider.chat.history.md
## Restore the previous chat history messages (default: False)
#AIDER_RESTORE_CHAT_HISTORY=false
## Log the conversation with the LLM to this file (for example, .aider.llm.history)
#AIDER_LLM_HISTORY_FILE=
##################
# Output Settings:
## Use colors suitable for a dark terminal background (default: False)
#AIDER_DARK_MODE=false
## Use colors suitable for a light terminal background (default: False)
#AIDER_LIGHT_MODE=false
## Enable/disable pretty, colorized output (default: True)
#AIDER_PRETTY=true
## Enable/disable streaming responses (default: True)
#AIDER_STREAM=true
## Set the color for user input (default: #00cc00)
#AIDER_USER_INPUT_COLOR=#00cc00
## Set the color for tool output (default: None)
#AIDER_TOOL_OUTPUT_COLOR=
## Set the color for tool error messages (default: red)
#AIDER_TOOL_ERROR_COLOR=#FF2222
## Set the color for assistant output (default: #0088ff)
#AIDER_ASSISTANT_OUTPUT_COLOR=#0088ff
## Set the markdown code theme (default: default, other options include monokai, solarized-dark, solarized-light)
#AIDER_CODE_THEME=default
## Show diffs when committing changes (default: False)
#AIDER_SHOW_DIFFS=false
###############
# Git Settings:
## Enable/disable looking for a git repo (default: True)
#AIDER_GIT=true
## Enable/disable adding .aider* to .gitignore (default: True)
#AIDER_GITIGNORE=true
## Specify the aider ignore file (default: .aiderignore in git root)
#AIDER_AIDERIGNORE=.aiderignore
## Enable/disable auto commit of LLM changes (default: True)
#AIDER_AUTO_COMMITS=true
## Enable/disable commits when repo is found dirty (default: True)
#AIDER_DIRTY_COMMITS=true
## Attribute aider code changes in the git author name (default: True)
#AIDER_ATTRIBUTE_AUTHOR=true
## Attribute aider commits in the git committer name (default: True)
#AIDER_ATTRIBUTE_COMMITTER=true
## Prefix commit messages with 'aider: ' (default: False)
#AIDER_ATTRIBUTE_COMMIT_MESSAGE=false
## Perform a dry run without modifying files (default: False)
#AIDER_DRY_RUN=false
########################
# Fixing and committing:
## Commit all pending changes with a suitable commit message, then exit
#AIDER_COMMIT=false
## Lint and fix provided files, or dirty files if none provided
#AIDER_LINT=false
## Specify lint commands to run for different languages, eg: "python: flake8 --select=..." (can be used multiple times)
#AIDER_LINT_CMD=
## Enable/disable automatic linting after changes (default: True)
#AIDER_AUTO_LINT=true
## Specify command to run tests
#AIDER_TEST_CMD=
## Enable/disable automatic testing after changes (default: False)
#AIDER_AUTO_TEST=false
## Run tests and fix problems found
#AIDER_TEST=false
#################
# Other Settings:
## Use VI editing mode in the terminal (default: False)
#AIDER_VIM=false
## Specify the language for voice using ISO 639-1 code (default: auto)
#AIDER_VOICE_LANGUAGE=en
## Check for updates and return status in the exit code
#AIDER_CHECK_UPDATE=false
## Skips checking for the update when the program runs
#AIDER_SKIP_CHECK_UPDATE=false
## Apply the changes from the given file instead of running the chat (debug)
#AIDER_APPLY=
## Always say yes to every confirmation
#AIDER_YES=
## Enable verbose output
#AIDER_VERBOSE=false
## Print the repo map and exit (debug)
#AIDER_SHOW_REPO_MAP=false
## Print the system prompts and exit (debug)
#AIDER_SHOW_PROMPTS=false
## Do all startup activities then exit before accepting user input (debug)
#AIDER_EXIT=false
## Specify a single message to send the LLM, process reply then exit (disables chat mode)
#AIDER_MESSAGE=
## Specify a file containing the message to send the LLM, process reply, then exit (disables chat mode)
#AIDER_MESSAGE_FILE=
## Specify the encoding for input and output (default: utf-8)
#AIDER_ENCODING=utf-8
## Run aider in your browser
#AIDER_GUI=false
```
<!--[[[end]]]-->

View file

@ -0,0 +1,468 @@
---
parent: Configuration
nav_order: 10
description: Details about all of aider's settings.
---
# Options reference
{: .no_toc }
You can use `aider --help` to see all the available options,
or review them below.
- TOC
{:toc}
## Usage summary
<!--[[[cog
from aider.args import get_md_help
cog.out(get_md_help())
]]]-->
```
usage: aider [-h] [--openai-api-key] [--anthropic-api-key] [--model]
[--opus] [--sonnet] [--4] [--4o] [--4-turbo]
[--35turbo] [--models] [--openai-api-base]
[--openai-api-type] [--openai-api-version]
[--openai-api-deployment-id] [--openai-organization-id]
[--model-settings-file] [--model-metadata-file]
[--verify-ssl | --no-verify-ssl] [--edit-format]
[--weak-model]
[--show-model-warnings | --no-show-model-warnings]
[--map-tokens] [--max-chat-history-tokens] [--env-file]
[--input-history-file] [--chat-history-file]
[--restore-chat-history | --no-restore-chat-history]
[--llm-history-file] [--dark-mode] [--light-mode]
[--pretty | --no-pretty] [--stream | --no-stream]
[--user-input-color] [--tool-output-color]
[--tool-error-color] [--assistant-output-color]
[--code-theme] [--show-diffs] [--git | --no-git]
[--gitignore | --no-gitignore] [--aiderignore]
[--auto-commits | --no-auto-commits]
[--dirty-commits | --no-dirty-commits]
[--attribute-author | --no-attribute-author]
[--attribute-committer | --no-attribute-committer]
[--attribute-commit-message | --no-attribute-commit-message]
[--dry-run | --no-dry-run] [--commit] [--lint]
[--lint-cmd] [--auto-lint | --no-auto-lint]
[--test-cmd] [--auto-test | --no-auto-test] [--test]
[--vim] [--voice-language] [--version] [--check-update]
[--skip-check-update] [--apply] [--yes] [-v]
[--show-repo-map] [--show-prompts] [--exit] [--message]
[--message-file] [--encoding] [-c] [--gui]
```
## options:
### `--help`
show this help message and exit
Aliases:
- `-h`
- `--help`
## Main:
### `--openai-api-key OPENAI_API_KEY`
Specify the OpenAI API key
Environment variable: `OPENAI_API_KEY`
### `--anthropic-api-key ANTHROPIC_API_KEY`
Specify the Anthropic API key
Environment variable: `ANTHROPIC_API_KEY`
### `--model MODEL`
Specify the model to use for the main chat
Environment variable: `AIDER_MODEL`
### `--opus`
Use claude-3-opus-20240229 model for the main chat
Environment variable: `AIDER_OPUS`
### `--sonnet`
Use claude-3-5-sonnet-20240620 model for the main chat
Environment variable: `AIDER_SONNET`
### `--4`
Use gpt-4-0613 model for the main chat
Environment variable: `AIDER_4`
Aliases:
- `--4`
- `-4`
### `--4o`
Use gpt-4o model for the main chat
Environment variable: `AIDER_4O`
### `--4-turbo`
Use gpt-4-1106-preview model for the main chat
Environment variable: `AIDER_4_TURBO`
### `--35turbo`
Use gpt-3.5-turbo model for the main chat
Environment variable: `AIDER_35TURBO`
Aliases:
- `--35turbo`
- `--35-turbo`
- `--3`
- `-3`
## Model Settings:
### `--models MODEL`
List known models which match the (partial) MODEL name
Environment variable: `AIDER_MODELS`
### `--openai-api-base OPENAI_API_BASE`
Specify the api base url
Environment variable: `OPENAI_API_BASE`
### `--openai-api-type OPENAI_API_TYPE`
Specify the api_type
Environment variable: `OPENAI_API_TYPE`
### `--openai-api-version OPENAI_API_VERSION`
Specify the api_version
Environment variable: `OPENAI_API_VERSION`
### `--openai-api-deployment-id OPENAI_API_DEPLOYMENT_ID`
Specify the deployment_id
Environment variable: `OPENAI_API_DEPLOYMENT_ID`
### `--openai-organization-id OPENAI_ORGANIZATION_ID`
Specify the OpenAI organization ID
Environment variable: `OPENAI_ORGANIZATION_ID`
### `--model-settings-file MODEL_SETTINGS_FILE`
Specify a file with aider model settings for unknown models
Default: .aider.model.settings.yml
Environment variable: `AIDER_MODEL_SETTINGS_FILE`
### `--model-metadata-file MODEL_METADATA_FILE`
Specify a file with context window and costs for unknown models
Default: .aider.model.metadata.json
Environment variable: `AIDER_MODEL_METADATA_FILE`
### `--verify-ssl`
Verify the SSL cert when connecting to models (default: True)
Default: True
Environment variable: `AIDER_VERIFY_SSL`
Aliases:
- `--verify-ssl`
- `--no-verify-ssl`
### `--edit-format EDIT_FORMAT`
Specify what edit format the LLM should use (default depends on model)
Environment variable: `AIDER_EDIT_FORMAT`
### `--weak-model WEAK_MODEL`
Specify the model to use for commit messages and chat history summarization (default depends on --model)
Environment variable: `AIDER_WEAK_MODEL`
### `--show-model-warnings`
Only work with models that have meta-data available (default: True)
Default: True
Environment variable: `AIDER_SHOW_MODEL_WARNINGS`
Aliases:
- `--show-model-warnings`
- `--no-show-model-warnings`
### `--map-tokens VALUE`
Max number of tokens to use for repo map, use 0 to disable (default: 1024)
Default: 1024
Environment variable: `AIDER_MAP_TOKENS`
### `--max-chat-history-tokens VALUE`
Maximum number of tokens to use for chat history. If not specified, uses the model's max_chat_history_tokens.
Environment variable: `AIDER_MAX_CHAT_HISTORY_TOKENS`
### `--env-file ENV_FILE`
Specify the .env file to load (default: .env in git root)
Default: .env
Environment variable: `AIDER_ENV_FILE`
## History Files:
### `--input-history-file INPUT_HISTORY_FILE`
Specify the chat input history file (default: .aider.input.history)
Default: .aider.input.history
Environment variable: `AIDER_INPUT_HISTORY_FILE`
### `--chat-history-file CHAT_HISTORY_FILE`
Specify the chat history file (default: .aider.chat.history.md)
Default: .aider.chat.history.md
Environment variable: `AIDER_CHAT_HISTORY_FILE`
### `--restore-chat-history`
Restore the previous chat history messages (default: False)
Default: False
Environment variable: `AIDER_RESTORE_CHAT_HISTORY`
Aliases:
- `--restore-chat-history`
- `--no-restore-chat-history`
### `--llm-history-file LLM_HISTORY_FILE`
Log the conversation with the LLM to this file (for example, .aider.llm.history)
Environment variable: `AIDER_LLM_HISTORY_FILE`
## Output Settings:
### `--dark-mode`
Use colors suitable for a dark terminal background (default: False)
Default: False
Environment variable: `AIDER_DARK_MODE`
### `--light-mode`
Use colors suitable for a light terminal background (default: False)
Default: False
Environment variable: `AIDER_LIGHT_MODE`
### `--pretty`
Enable/disable pretty, colorized output (default: True)
Default: True
Environment variable: `AIDER_PRETTY`
Aliases:
- `--pretty`
- `--no-pretty`
### `--stream`
Enable/disable streaming responses (default: True)
Default: True
Environment variable: `AIDER_STREAM`
Aliases:
- `--stream`
- `--no-stream`
### `--user-input-color VALUE`
Set the color for user input (default: #00cc00)
Default: #00cc00
Environment variable: `AIDER_USER_INPUT_COLOR`
### `--tool-output-color VALUE`
Set the color for tool output (default: None)
Environment variable: `AIDER_TOOL_OUTPUT_COLOR`
### `--tool-error-color VALUE`
Set the color for tool error messages (default: red)
Default: #FF2222
Environment variable: `AIDER_TOOL_ERROR_COLOR`
### `--assistant-output-color VALUE`
Set the color for assistant output (default: #0088ff)
Default: #0088ff
Environment variable: `AIDER_ASSISTANT_OUTPUT_COLOR`
### `--code-theme VALUE`
Set the markdown code theme (default: default, other options include monokai, solarized-dark, solarized-light)
Default: default
Environment variable: `AIDER_CODE_THEME`
### `--show-diffs`
Show diffs when committing changes (default: False)
Default: False
Environment variable: `AIDER_SHOW_DIFFS`
## Git Settings:
### `--git`
Enable/disable looking for a git repo (default: True)
Default: True
Environment variable: `AIDER_GIT`
Aliases:
- `--git`
- `--no-git`
### `--gitignore`
Enable/disable adding .aider* to .gitignore (default: True)
Default: True
Environment variable: `AIDER_GITIGNORE`
Aliases:
- `--gitignore`
- `--no-gitignore`
### `--aiderignore AIDERIGNORE`
Specify the aider ignore file (default: .aiderignore in git root)
Default: .aiderignore
Environment variable: `AIDER_AIDERIGNORE`
### `--auto-commits`
Enable/disable auto commit of LLM changes (default: True)
Default: True
Environment variable: `AIDER_AUTO_COMMITS`
Aliases:
- `--auto-commits`
- `--no-auto-commits`
### `--dirty-commits`
Enable/disable commits when repo is found dirty (default: True)
Default: True
Environment variable: `AIDER_DIRTY_COMMITS`
Aliases:
- `--dirty-commits`
- `--no-dirty-commits`
### `--attribute-author`
Attribute aider code changes in the git author name (default: True)
Default: True
Environment variable: `AIDER_ATTRIBUTE_AUTHOR`
Aliases:
- `--attribute-author`
- `--no-attribute-author`
### `--attribute-committer`
Attribute aider commits in the git committer name (default: True)
Default: True
Environment variable: `AIDER_ATTRIBUTE_COMMITTER`
Aliases:
- `--attribute-committer`
- `--no-attribute-committer`
### `--attribute-commit-message`
Prefix commit messages with 'aider: ' (default: False)
Default: False
Environment variable: `AIDER_ATTRIBUTE_COMMIT_MESSAGE`
Aliases:
- `--attribute-commit-message`
- `--no-attribute-commit-message`
### `--dry-run`
Perform a dry run without modifying files (default: False)
Default: False
Environment variable: `AIDER_DRY_RUN`
Aliases:
- `--dry-run`
- `--no-dry-run`
## Fixing and committing:
### `--commit`
Commit all pending changes with a suitable commit message, then exit
Default: False
Environment variable: `AIDER_COMMIT`
### `--lint`
Lint and fix provided files, or dirty files if none provided
Default: False
Environment variable: `AIDER_LINT`
### `--lint-cmd`
Specify lint commands to run for different languages, eg: "python: flake8 --select=..." (can be used multiple times)
Default: []
Environment variable: `AIDER_LINT_CMD`
### `--auto-lint`
Enable/disable automatic linting after changes (default: True)
Default: True
Environment variable: `AIDER_AUTO_LINT`
Aliases:
- `--auto-lint`
- `--no-auto-lint`
### `--test-cmd VALUE`
Specify command to run tests
Default: []
Environment variable: `AIDER_TEST_CMD`
### `--auto-test`
Enable/disable automatic testing after changes (default: False)
Default: False
Environment variable: `AIDER_AUTO_TEST`
Aliases:
- `--auto-test`
- `--no-auto-test`
### `--test`
Run tests and fix problems found
Default: False
Environment variable: `AIDER_TEST`
## Other Settings:
### `--vim`
Use VI editing mode in the terminal (default: False)
Default: False
Environment variable: `AIDER_VIM`
### `--voice-language VOICE_LANGUAGE`
Specify the language for voice using ISO 639-1 code (default: auto)
Default: en
Environment variable: `AIDER_VOICE_LANGUAGE`
### `--version`
Show the version number and exit
### `--check-update`
Check for updates and return status in the exit code
Default: False
Environment variable: `AIDER_CHECK_UPDATE`
### `--skip-check-update`
Skips checking for the update when the program runs
Default: False
Environment variable: `AIDER_SKIP_CHECK_UPDATE`
### `--apply FILE`
Apply the changes from the given file instead of running the chat (debug)
Environment variable: `AIDER_APPLY`
### `--yes`
Always say yes to every confirmation
Environment variable: `AIDER_YES`
### `--verbose`
Enable verbose output
Default: False
Environment variable: `AIDER_VERBOSE`
Aliases:
- `-v`
- `--verbose`
### `--show-repo-map`
Print the repo map and exit (debug)
Default: False
Environment variable: `AIDER_SHOW_REPO_MAP`
### `--show-prompts`
Print the system prompts and exit (debug)
Default: False
Environment variable: `AIDER_SHOW_PROMPTS`
### `--exit`
Do all startup activities then exit before accepting user input (debug)
Default: False
Environment variable: `AIDER_EXIT`
### `--message COMMAND`
Specify a single message to send the LLM, process reply then exit (disables chat mode)
Environment variable: `AIDER_MESSAGE`
Aliases:
- `--message COMMAND`
- `--msg COMMAND`
- `-m COMMAND`
### `--message-file MESSAGE_FILE`
Specify a file containing the message to send the LLM, process reply, then exit (disables chat mode)
Environment variable: `AIDER_MESSAGE_FILE`
Aliases:
- `--message-file MESSAGE_FILE`
- `-f MESSAGE_FILE`
### `--encoding VALUE`
Specify the encoding for input and output (default: utf-8)
Default: utf-8
Environment variable: `AIDER_ENCODING`
### `--config CONFIG_FILE`
Specify the config file (default: search for .aider.conf.yml in git root, cwd or home directory)
Aliases:
- `-c CONFIG_FILE`
- `--config CONFIG_FILE`
### `--gui`
Run aider in your browser
Default: False
Environment variable: `AIDER_GUI`
Aliases:
- `--gui`
- `--browser`
<!--[[[end]]]-->

View file

@ -0,0 +1,87 @@
---
parent: Usage
nav_order: 800
description: Tell aider to follow your coding conventions when it works on your code.
---
# Specifying coding conventions
Sometimes you want GPT to be aware of certain coding guidelines,
like whether to provide type hints, which libraries or packages
to prefer, etc.
The easiest way to do that with aider is to simply create
a small markdown file and include it in the chat.
For example, say we want our python code to:
```
- Prefer httpx over requests for making http requests.
- Use types everywhere possible.
```
We would simply create a file like `CONVENTIONS.md` with those lines
and then we can add it to the aider chat, along with the file(s)
that we want to edit:
<div class="chat-transcript" markdown="1">
> $ aider CONVENTIONS.md useragent.py
>
> Aider v0.24.2-dev
> Added CONVENTIONS.md to the chat.
> Added useragent.py to the chat.
#### write a function that uses httpbin to check the user agent on the request. return the ua.
To fulfill your request, I will:
1. Add an import for `httpx`.
2. Write a function named `check_user_agent` that makes a GET request to `httpbin.org/user-agent` and returns the user agent string.
Here is the diff for the `useragent.py` file:
```diff
--- useragent.py
+++ useragent.py
@@ ... @@
+import httpx
+
+def check_user_agent() -> str:
+ response = httpx.get('https://httpbin.org/user-agent')
+ return response.json()['user-agent']
```
This function uses `httpx` to make a GET request to `httpbin.org/user-agent`, which returns a JSON response containing the user agent of the HTTP client. The function then extracts the user agent string from the JSON response and returns it.
> Applied edit to useragent.py
</div>
GPT correctly used `httpx` and provided type hints.
See below for the transcript of this same
coding task conducted *without*
using `CONVENTIONS.md`.
In this case, GPT used `requests` and skipped types,
which is perhaps more typical in small python scripts.
<div class="chat-transcript" markdown="1">
> $ aider useragent.py
>
> Aider v0.24.2-dev
> Added useragent.py to the chat.
#### write a function that uses httpbin to check the user agent on the request. return the ua.
```diff
--- /dev/null
+++ useragent.py
@@ ... @@
+import requests
+
+def check_user_agent():
+ response = requests.get('https://httpbin.org/user-agent')
+ user_agent = response.json().get('user-agent')
+ return user_agent
```
> Applied edit to useragent.py
</div>

250
aider/website/docs/ctags.md Normal file
View file

@ -0,0 +1,250 @@
---
title: Improving GPT-4's codebase understanding with ctags
excerpt: Using ctags to build a "repository map" to increase GPT-4's ability to understand a large code base.
highlight_image: /assets/robot-flowchart.png
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Improving GPT-4's codebase understanding with ctags
![robot flowchat](/assets/robot-flowchart.png)
## Updated
Aider no longer uses ctags to build a repo map.
Please see the newer article about
[using tree-sitter to build a better repo map](https://aider.chat/docs/repomap.html).
-------
GPT-4 is extremely useful for "self-contained" coding tasks,
like generating brand new code or modifying a pure function
that has no dependencies.
But it's difficult to use GPT-4 to modify or extend
a large, complex pre-existing codebase.
To modify such code, GPT needs to understand the dependencies and APIs
which interconnect its subsystems.
Somehow we need to provide this "code context" to GPT
when we ask it to accomplish a coding task. Specifically, we need to:
- Help GPT understand the overall codebase, so that it
can decifer the meaning of code with complex dependencies and generate
new code that respects and utilizes existing abstractions.
- Convey all of this "code context" to GPT in an
efficient manner that fits within the 8k-token context window.
To address these issues, `aider` now
sends GPT a **concise map of your whole git repository**
that includes
all declared variables and functions with call signatures.
This *repo map* is built automatically using `ctags`, which
extracts symbol definitions from source files. Historically,
ctags were generated and indexed by IDEs and editors to
help humans search and navigate large codebases.
Instead, we're going to use ctags to help GPT better comprehend, navigate
and edit code in larger repos.
To get a sense of how effective this can be, this
[chat transcript](https://aider.chat/examples/add-test.html)
shows GPT-4 creating a black box test case, **without being given
access to the source code of the function being tested or any of the
other code in the repo.**
Using only the meta-data in the repo map, GPT is able to figure out how to
call the method to be tested, as well as how to instantiate multiple
class objects that are required to prepare for the test.
To code with GPT-4 using the techniques discussed here:
- Install [aider](https://aider.chat/docs/install.html).
- Install universal ctags.
- Run `aider` inside your repo, and it should say "Repo-map: universal-ctags using 1024 tokens".
## The problem: code context
GPT-4 is great at "self contained" coding tasks, like writing or
modifying a pure function with no external dependencies.
GPT can easily handle requests like "write a
Fibonacci function" or "rewrite the loop using list
comprehensions", because they require no context beyond the code
being discussed.
Most real code is not pure and self-contained, it is intertwined with
and depends on code from many different files in a repo.
If you ask GPT to "switch all the print statements in class Foo to
use the BarLog logging system", it needs to see the code in the Foo class
with the prints, and it also needs to understand the project's BarLog
subsystem.
A simple solution is to **send the entire codebase** to GPT along with
each change request. Now GPT has all the context! But this won't work
for even moderately
sized repos, because they won't fit into the 8k-token context window.
A better approach is to be selective,
and **hand pick which files to send**.
For the example above, you could send the file that
contains the Foo class
and the file that contains the BarLog logging subsystem.
This works pretty well, and is supported by `aider` -- you
can manually specify which files to "add to the chat" you are having with GPT.
But it's not ideal to have to manually identify the right
set of files to add to the chat.
And sending whole files is a bulky way to send code context,
wasting the precious 8k context window.
GPT doesn't need to see the entire implementation of BarLog,
it just needs to understand it well enough to use it.
You may quickly run out of context window if you
send many files worth of code just to convey context.
## Using a repo map to provide context
The latest version of `aider` sends a **repo map** to GPT along with
each change request. The map contains a list of all the files in the
repo, along with the symbols which are defined in each file. Callables
like functions and methods also include their signatures.
Here's a
sample of the map of the aider repo, just showing the maps of
[main.py](https://github.com/paul-gauthier/aider/blob/main/aider/main.py)
and
[io.py](https://github.com/paul-gauthier/aider/blob/main/aider/io.py)
:
```
aider/
...
main.py:
function
main (args=None, input=None, output=None)
variable
status
...
io.py:
class
FileContentCompleter
InputOutput
FileContentCompleter
member
__init__ (self, fnames, commands)
get_completions (self, document, complete_event)
InputOutput
member
__init__ (self, pretty, yes, input_history_file=None, chat_history_file=None, input=None, output=None)
ai_output (self, content)
append_chat_history (self, text, linebreak=False, blockquote=False)
confirm_ask (self, question, default="y")
get_input (self, fnames, commands)
prompt_ask (self, question, default=None)
tool (self, *messages, log_only=False)
tool_error (self, message)
...
```
Mapping out the repo like this provides some benefits:
- GPT can see variables, classes, methods and function signatures from everywhere in the repo. This alone may give it enough context to solve many tasks. For example, it can probably figure out how to use the API exported from a module just based on the details shown in the map.
- If it needs to see more code, GPT can use the map to figure out by itself which files it needs to look at. GPT will then ask to see these specific files, and `aider` will automatically add them to the chat context (with user approval).
Of course, for large repositories even just the map might be too large
for the context window. However, this mapping approach opens up the
ability to collaborate with GPT-4 on larger codebases than previous
methods. It also reduces the need to manually curate which files to
add to the chat context, empowering GPT to autonomously identify
relevant files for the task at hand.
## Using ctags to make the map
Under the hood, `aider` uses
[universal ctags](https://github.com/universal-ctags/ctags)
to build the
map. Universal ctags can scan source code written in many
languages, and extract data about all the symbols defined in each
file.
Historically, ctags were generated and indexed by IDEs or code editors
to make it easier for a human to search and navigate a
codebase, find the implementation of functions, etc.
Instead, we're going to use ctags to help GPT navigate and understand the codebase.
Here is the type of output you get when you run ctags on source code. Specifically,
this is the
`ctags --fields=+S --output-format=json` output for the `main.py` file mapped above:
```json
{
"_type": "tag",
"name": "main",
"path": "aider/main.py",
"pattern": "/^def main(args=None, input=None, output=None):$/",
"kind": "function",
"signature": "(args=None, input=None, output=None)"
}
{
"_type": "tag",
"name": "status",
"path": "aider/main.py",
"pattern": "/^ status = main()$/",
"kind": "variable"
}
```
The repo map is built using this type of `ctags` data,
but formatted into the space
efficient hierarchical tree format shown earlier.
This is a format that GPT can easily understand
and which conveys the map data using a
minimal number of tokens.
## Example chat transcript
This
[chat transcript](https://aider.chat/examples/add-test.html)
shows GPT-4 creating a black box test case, **without being given
access to the source code of the function being tested or any of the
other code in the repo.** Instead, GPT is operating solely off
the repo map.
Using only the meta-data in the map, GPT is able to figure out how to call the method to be tested, as well as how to instantiate multiple class objects that are required to prepare for the test.
GPT makes one reasonable mistake writing the first version of the test, but is
able to quickly fix the issue after being shown the `pytest` error output.
## Future work
Just as "send the whole codebase to GPT with every request"
is not an efficient solution to this problem,
there are probably better approaches than
"send the whole repo map with every request".
Sending an appropriate subset of the repo map would help `aider` work
better with even larger repositories which have large maps.
Some possible approaches to reducing the amount of map data are:
- Distill the global map, to prioritize important symbols and discard "internal" or otherwise less globally relevant identifiers. Possibly enlist `gpt-3.5-turbo` to perform this distillation in a flexible and language agnostic way.
- Provide a mechanism for GPT to start with a distilled subset of the global map, and let it ask to see more detail about subtrees or keywords that it feels are relevant to the current coding task.
- Attempt to analyize the natural language coding task given by the user and predict which subset of the repo map is relevant. Possibly by analysis of prior coding chats within the specific repo. Work on certain files or types of features may require certain somewhat predictable context from elsewhere in the repo. Vector and keyword search against the chat history, repo map or codebase may help here.
One key goal is to prefer solutions which are language agnostic or
which can be easily deployed against most popular code languages.
The `ctags` solution has this benefit, since it comes pre-built
with support for most popular languages.
I suspect that Language Server Protocol might be an even
better tool than `ctags` for this problem.
But it is more cumbersome to deploy for a broad
array of languages.
Users would need to stand up an LSP server for their
specific language(s) of interest.
## Try it out
To use this experimental repo map feature:
- Install [aider](https://aider.chat/docs/install.html).
- Install ctags.
- Run `aider` inside your repo, and it should say "Repo-map: universal-ctags using 1024 tokens".

104
aider/website/docs/faq.md Normal file
View file

@ -0,0 +1,104 @@
---
nav_order: 90
description: Frequently asked questions about aider.
---
# FAQ
{: .no_toc }
- TOC
{:toc}
## How can I add ALL the files to the chat?
People regularly ask about how to add **many or all of their repo's files** to the chat.
This is probably not a good idea and will likely do more harm than good.
The best approach is think about which files need to be changed to accomplish
the task you are working on. Just add those files to the chat.
Usually when people want to add "all the files" it's because they think it
will give the LLM helpful context about the overall code base.
Aider will automatically give the LLM a bunch of additional context about
the rest of your git repo.
It does this by analyzing your entire codebase in light of the
current chat to build a compact
[repository map](https://aider.chat/2023/10/22/repomap.html).
Adding a bunch of files that are mostly irrelevant to the
task at hand will often distract or confuse the LLM.
The LLM will give worse coding results, and sometimese even fail to correctly edit files.
Addings extra files will also increase the token costs on your OpenAI invoice.
Again, it's usually best to just add the files to the chat that will need to be modified.
If you still wish to add lots of files to the chat, you can:
- Use a wildcard when you launch aider: `aider src/*.py`
- Use a wildcard with the in-chat `/add` command: `/add src/*.py`
- Give the `/add` command a directory name and it will recurisvely add every file under that dir: `/add src`
## How can I run aider locally from source code?
To run the project locally, follow these steps:
```
# Clone the repository:
git clone git@github.com:paul-gauthier/aider.git
# Navigate to the project directory:
cd aider
# Install the dependencies listed in the `requirements.txt` file:
pip install -r requirements.txt
# Run the local version of Aider:
python -m aider.main
```
## Can I run aider in Google Colab?
User [imabutahersiddik](https://github.com/imabutahersiddik)
has provided this
[Colab notebook](https://colab.research.google.com/drive/1J9XynhrCqekPL5PR6olHP6eE--rnnjS9?usp=sharing).
## Can I change the system prompts that aider uses?
Aider is set up to support different system prompts and edit formats
in a modular way. If you look in the `aider/coders` subdirectory, you'll
see there's a base coder with base prompts, and then there are
a number of
different specific coder implementations.
If you're thinking about experimenting with system prompts
this document about
[benchmarking GPT-3.5 and GPT-4 on code editing](https://aider.chat/docs/benchmarks.html)
might be useful background.
While it's not well documented how to add new coder subsystems, you may be able
to modify an existing implementation or use it as a template to add another.
To get started, try looking at and modifying these files.
The wholefile coder is currently used by GPT-3.5 by default. You can manually select it with `--edit-format whole`.
- wholefile_coder.py
- wholefile_prompts.py
The editblock coder is currently used by GPT-4o by default. You can manually select it with `--edit-format diff`.
- editblock_coder.py
- editblock_prompts.py
The universal diff coder is currently used by GPT-4 Turbo by default. You can manually select it with `--edit-format udiff`.
- udiff_coder.py
- udiff_prompts.py
When experimenting with coder backends, it helps to run aider with `--verbose --no-pretty` so you can see
all the raw information being sent to/from the LLM in the conversation.
You can also refer to the
[instructions for installing a development version of aider](https://aider.chat/docs/install/optional.html#install-the-development-version-of-aider).

49
aider/website/docs/git.md Normal file
View file

@ -0,0 +1,49 @@
---
parent: More info
nav_order: 800
description: Aider is tightly integrated with git.
---
# Git integration
Aider works best with code that is part of a git repo.
Aider is tightly integrated with git, which makes it easy to:
- Use git to undo any aider changes that you don't like
- Go back in the git history to review the changes that aider made to your code
- Manage a series of aider's changes on a git branch
Aider specifically uses git in these ways:
- It asks to create a git repo if you launch it in a directory without one.
- Whenever aider edits a file, it commits those changes with a descriptive commit message. This makes it easy to undo or review aider's changes.
- Aider takes special care before editing files that already have uncommitted changes (dirty files). Aider will first commit any preexisting changes with a descriptive commit message.
This keeps your edits separate from aider's edits, and makes sure you never lose your work if aider makes an inappropriate change.
## In-chat commands
Aider also allows you to use in-chat commands to `/diff` or `/undo` the last change.
To do more complex management of your git history, you cat use raw `git` commands,
either by using `/git` within the chat, or with standard git tools outside of aider.
## Disabling git integration
While it is not recommended, you can disable aider's use of git in a few ways:
- `--no-auto-commits` will stop aider from git committing each of its changes.
- `--no-dirty-commits` will stop aider from committing dirty files before applying its edits.
- `--no-git` will completely stop aider from using git on your files. You should ensure you are keeping sensible backups of the files you are working with.
## Commit attribution
Aider marks commits that it either authored or committed.
- If aider authored the changes in a commit, they will have "(aider)" appended to the git author and git committer name metadata.
- If aider simply committed changes (found in dirty files), the commit will have "(aider)" appended to the git committer name metadata.
You can use `--no-attribute-author` and `--no-attribute-committer` to disable
modification of the git author and committer name fields.
Additionally, you can use `--attribute-commit-message` to prefix commit messages with 'aider: '.
This option is disabled by default, but can be useful for easily identifying commits made by aider.

Some files were not shown because too many files have changed in this diff Show more