copy

2025-06-30 08:20:44 +00:00 · 2024-03-08 08:09:07 -08:00 · 2024-03-08 08:09:07 -08:00 · 573a6814b2
commit 573a6814b2
parent 7fbd9e2be4
2 changed files with 12 additions and 10 deletions
--- a/_posts/2024-03-08-claude-3.md
+++ b/_posts/2024-03-08-claude-3.md
@ -1,5 +1,5 @@
 ---
-title: Claude 3 beats all OpenAI models on Aider code editing benchmark
+title: Claude 3 beats GPT-4 on Aider code editing benchmark
 excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI.
 highlight_image: /assets/2024-03-07-claude-3.svg
 ---
@ -7,7 +7,7 @@ highlight_image: /assets/2024-03-07-claude-3.svg

 [![benchmark results](/assets/2024-03-07-claude-3.svg)](https://aider.chat/assets/2024-03-07-claude-3.svg)

-[Anthropic just release their new Claude 3 models]()
+[Anthropic just released their new Claude 3 models](https://www.anthropic.com/news/claude-3-family)
 with evals showing better performance on coding tasks.
 With that in mind, I've been benchmarking the new models
 using Aider's code editing benchmark suite.
@ -18,12 +18,12 @@ Aider currently supports Claude 3 Opus via
 [OpenRouter](https://aider.chat/docs/faq.html#accessing-other-llms-with-openrouter):

 ```
-# Install Aider
+# Install aider
 pip install aider-chat

-# Setup openrouter access
+# Setup OpenRouter access
 export OPENAI_API_KEY=<your-openrouter-key>
-export export OPENAI_API_BASE=https://openrouter.ai/api/v1
+export OPENAI_API_BASE=https://openrouter.ai/api/v1

 # Run aider with Claude 3 Opus using the diff editing format
 aider --model anthropic/claude-3-opus --edit-format diff
@ -56,7 +56,8 @@ The LLM gets two tries to solve each problem:
 ### Claude 3 Opus

 - The new `claude-3-opus-20240229` model got the highest score ever on this benchmark, completing 68.4% of the tasks with two tries.
- It's single-try performance was comparable to the latest GPT-4 Turbo model `gpt-4-0125-preview`, at 54.1%.
+- Its single-try performance was comparable to the latest GPT-4 Turbo model `gpt-4-0125-preview`, at 54.1%.
+- While Opus got the highest score, it was only a few points higher than the GPT-4 Turbo results. Given the extra costs of Opus and the slower response times, it remains to be seen which is the most practical model for daily coding use.

 ### Claude 3 Sonnet

@ -67,7 +68,8 @@ The LLM gets two tries to solve each problem:
 There are a few other things worth noting:

 - Claude 3 Opus and Sonnet are both slower and more expensive than OpenAI's models. You can get almost the same coding skill faster and cheaper with OpenAI's models.
- The Claude models refused to perform a number of coding tasks and returned the error "Output blocked by content filtering policy". They refused to code up the [beer song]() program, which at makes some sort of superficial sense. But they also refused to work in some larger open source code bases, for unclear reasons.
+- Claude 3 has a 2X larger context window than the latest GPT-4 Turbo, which may be an advantage when working with larger code bases.
+- The Claude models refused to perform a number of coding tasks and returned the error "Output blocked by content filtering policy". They refused to code up the [beer song](https://exercism.org/tracks/python/exercises/beer-song) program, which at makes some sort of superficial sense. But they also refused to work in some larger open source code bases, for unclear reasons.
 - The Claude API's seem somewhat unstable, returning HTTP 5xx errors of various sorts. Aider does exponential backoff retries in these cases, but it's a sign that they made be struggling under surging demand.


--- a/docs/faq.md
+++ b/docs/faq.md
@ -79,12 +79,12 @@ which contains many benchmarking articles.
 To access the OpenRouter models, simply:

 ```
-# Install Aider
+# Install aider
 pip install aider-chat

-# Setup openrouter access
+# Setup OpenRouter access
 export OPENAI_API_KEY=<your-openrouter-key>
-export export OPENAI_API_BASE=https://openrouter.ai/api/v1
+export OPENAI_API_BASE=https://openrouter.ai/api/v1

 # For example, run aider with Claude 3 Opus using the diff editing format
 aider --model anthropic/claude-3-opus --edit-format diff