mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-24 22:34:59 +00:00
Added claude post
This commit is contained in:
parent
a1939f50e4
commit
7fbd9e2be4
4 changed files with 2123 additions and 7 deletions
75
_posts/2024-03-08-claude-3.md
Normal file
75
_posts/2024-03-08-claude-3.md
Normal file
|
@ -0,0 +1,75 @@
|
|||
---
|
||||
title: Claude 3 beats all OpenAI models on Aider code editing benchmark
|
||||
excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI.
|
||||
highlight_image: /assets/2024-03-07-claude-3.svg
|
||||
---
|
||||
# Claude 3 beats GPT-4 on Aider code editing benchmark
|
||||
|
||||
[](https://aider.chat/assets/2024-03-07-claude-3.svg)
|
||||
|
||||
[Anthropic just release their new Claude 3 models]()
|
||||
with evals showing better performance on coding tasks.
|
||||
With that in mind, I've been benchmarking the new models
|
||||
using Aider's code editing benchmark suite.
|
||||
Claude 3 Opus outperforms all of OpenAI's models,
|
||||
making it the best available model for pair programming with AI.
|
||||
|
||||
Aider currently supports Claude 3 Opus via
|
||||
[OpenRouter](https://aider.chat/docs/faq.html#accessing-other-llms-with-openrouter):
|
||||
|
||||
```
|
||||
# Install Aider
|
||||
pip install aider-chat
|
||||
|
||||
# Setup openrouter access
|
||||
export OPENAI_API_KEY=<your-openrouter-key>
|
||||
export export OPENAI_API_BASE=https://openrouter.ai/api/v1
|
||||
|
||||
# Run aider with Claude 3 Opus using the diff editing format
|
||||
aider --model anthropic/claude-3-opus --edit-format diff
|
||||
```
|
||||
|
||||
## Aider's code editing benchmark
|
||||
|
||||
[Aider](https://github.com/paul-gauthier/aider)
|
||||
is an open source command line chat tool that lets you
|
||||
pair program with AI on code in your local git repo.
|
||||
|
||||
Aider relies on a
|
||||
[code editing benchmark](https://aider.chat/docs/benchmarks.html)
|
||||
to quantitatively evaluate how well
|
||||
an LLM can make changes to existing code.
|
||||
The benchmark uses aider to try and complete
|
||||
[133 Exercism Python coding exercises](https://github.com/exercism/python).
|
||||
For each exercise,
|
||||
Exercism provides a starting python file with stubs for the needed functions,
|
||||
a natural language description of the problem to solve
|
||||
and a test suite to evaluate whether the coder has correctly solved the problem.
|
||||
|
||||
The LLM gets two tries to solve each problem:
|
||||
|
||||
1. On the first try, it gets the initial stub code and the English description of the coding task. If the tests all pass, we are done.
|
||||
2. If the tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
|
||||
|
||||
## Benchmark results
|
||||
|
||||
### Claude 3 Opus
|
||||
|
||||
- The new `claude-3-opus-20240229` model got the highest score ever on this benchmark, completing 68.4% of the tasks with two tries.
|
||||
- It's single-try performance was comparable to the latest GPT-4 Turbo model `gpt-4-0125-preview`, at 54.1%.
|
||||
|
||||
### Claude 3 Sonnet
|
||||
|
||||
- The new `claude-3-sonnet-20240229` model performed similarly to OpenAI's GPT-3.5 Turbo models with an overall score of 54.9% and a first-try score of 43.6%.
|
||||
|
||||
## Other observations
|
||||
|
||||
There are a few other things worth noting:
|
||||
|
||||
- Claude 3 Opus and Sonnet are both slower and more expensive than OpenAI's models. You can get almost the same coding skill faster and cheaper with OpenAI's models.
|
||||
- The Claude models refused to perform a number of coding tasks and returned the error "Output blocked by content filtering policy". They refused to code up the [beer song]() program, which at makes some sort of superficial sense. But they also refused to work in some larger open source code bases, for unclear reasons.
|
||||
- The Claude API's seem somewhat unstable, returning HTTP 5xx errors of various sorts. Aider does exponential backoff retries in these cases, but it's a sign that they made be struggling under surging demand.
|
||||
|
||||
|
||||
|
||||
|
|
@ -682,6 +682,10 @@ class Coder:
|
|||
if self.verbose:
|
||||
print(completion)
|
||||
|
||||
if not completion.choices:
|
||||
self.io.tool_error(str(completion))
|
||||
return
|
||||
|
||||
show_func_err = None
|
||||
show_content_err = None
|
||||
try:
|
||||
|
|
2031
assets/2024-03-07-claude-3.svg
Normal file
2031
assets/2024-03-07-claude-3.svg
Normal file
File diff suppressed because it is too large
Load diff
After Width: | Height: | Size: 56 KiB |
20
docs/faq.md
20
docs/faq.md
|
@ -74,16 +74,22 @@ which contains many benchmarking articles.
|
|||
|
||||
## Accessing other LLMs with OpenRouter
|
||||
|
||||
[OpenRouter](https://openrouter.ai) provide an interface to [many models](https://openrouter.ai/docs) which are not widely accessible, in particular gpt-4-32k and claude-2.
|
||||
[OpenRouter](https://openrouter.ai) provide an interface to [many models](https://openrouter.ai/models) which are not widely accessible, in particular Claude 3 Opus.
|
||||
|
||||
To access the openrouter models simply
|
||||
To access the OpenRouter models, simply:
|
||||
|
||||
- register for an account, purchase some credits and generate an api key
|
||||
- set `--openai-api-base https://openrouter.ai/api/v1`
|
||||
- set `--openai-api-key` to your openrouter key
|
||||
- set `--model` to the model of your choice (`openai/gpt-4-32k`, `anthropic/claude-2` etc.)
|
||||
```
|
||||
# Install Aider
|
||||
pip install aider-chat
|
||||
|
||||
# Setup openrouter access
|
||||
export OPENAI_API_KEY=<your-openrouter-key>
|
||||
export export OPENAI_API_BASE=https://openrouter.ai/api/v1
|
||||
|
||||
# For example, run aider with Claude 3 Opus using the diff editing format
|
||||
aider --model anthropic/claude-3-opus --edit-format diff
|
||||
```
|
||||
|
||||
Some of the models weren't very functional and each llm has its own quirks. The anthropic models work ok, but the llama-2 ones in particular will need more work to play friendly with aider.
|
||||
|
||||
## Can I use aider with other LLMs, local LLMs, etc?
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue