Added claude post

2025-05-24 22:34:59 +00:00 · 2024-03-08 08:00:41 -08:00 · 2024-03-08 08:00:41 -08:00 · 7fbd9e2be4
commit 7fbd9e2be4
parent a1939f50e4
4 changed files with 2123 additions and 7 deletions
--- a/_posts/2024-03-08-claude-3.md
+++ b/_posts/2024-03-08-claude-3.md
@ -0,0 +1,75 @@
 ---
 title: Claude 3 beats all OpenAI models on Aider code editing benchmark
 excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI.
 highlight_image: /assets/2024-03-07-claude-3.svg
 ---
 # Claude 3 beats GPT-4 on Aider code editing benchmark
 [![benchmark results](/assets/2024-03-07-claude-3.svg)](https://aider.chat/assets/2024-03-07-claude-3.svg)
 [Anthropic just release their new Claude 3 models]()
 with evals showing better performance on coding tasks.
 With that in mind, I've been benchmarking the new models
 using Aider's code editing benchmark suite.
 Claude 3 Opus outperforms all of OpenAI's models,
 making it the best available model for pair programming with AI.
 Aider currently supports Claude 3 Opus via
 [OpenRouter](https://aider.chat/docs/faq.html#accessing-other-llms-with-openrouter):
 ```
 # Install Aider
 pip install aider-chat
 # Setup openrouter access
 export OPENAI_API_KEY=<your-openrouter-key>
 export export OPENAI_API_BASE=https://openrouter.ai/api/v1
 # Run aider with Claude 3 Opus using the diff editing format
 aider --model anthropic/claude-3-opus --edit-format diff
 ```
 ## Aider's code editing benchmark
 [Aider](https://github.com/paul-gauthier/aider)
 is an open source command line chat tool that lets you
 pair program with AI on code in your local git repo.
 Aider relies on a
 [code editing benchmark](https://aider.chat/docs/benchmarks.html)
 to quantitatively evaluate how well
 an LLM can make changes to existing code.
 The benchmark uses aider to try and complete
 [133 Exercism Python coding exercises](https://github.com/exercism/python).
 For each exercise,
 Exercism provides a starting python file with stubs for the needed functions,
 a natural language description of the problem to solve
 and a test suite to evaluate whether the coder has correctly solved the problem.
 The LLM gets two tries to solve each problem:
 1. On the first try, it gets the initial stub code and the English description of the coding task. If the tests all pass, we are done.
 2. If the tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
 ## Benchmark results
 ### Claude 3 Opus
 - The new `claude-3-opus-20240229` model got the highest score ever on this benchmark, completing 68.4% of the tasks with two tries.
 - It's single-try performance was comparable to the latest GPT-4 Turbo model `gpt-4-0125-preview`, at 54.1%.
 ### Claude 3 Sonnet
 - The new `claude-3-sonnet-20240229` model performed similarly to OpenAI's GPT-3.5 Turbo models with an overall score of 54.9% and a first-try score of 43.6%.
 ## Other observations
 There are a few other things worth noting:
 - Claude 3 Opus and Sonnet are both slower and more expensive than OpenAI's models. You can get almost the same coding skill faster and cheaper with OpenAI's models.
 - The Claude models refused to perform a number of coding tasks and returned the error "Output blocked by content filtering policy". They refused to code up the [beer song]() program, which at makes some sort of superficial sense. But they also refused to work in some larger open source code bases, for unclear reasons.
 - The Claude API's seem somewhat unstable, returning HTTP 5xx errors of various sorts. Aider does exponential backoff retries in these cases, but it's a sign that they made be struggling under surging demand.
--- a/aider/coders/base_coder.py
+++ b/aider/coders/base_coder.py
@ -682,6 +682,10 @@ class Coder:
        if self.verbose:
            print(completion)
        if not completion.choices:
            self.io.tool_error(str(completion))
            return
        show_func_err = None
        show_content_err = None
        try:
--- a/assets/2024-03-07-claude-3.svg
+++ b/assets/2024-03-07-claude-3.svg
--- a/docs/faq.md
+++ b/docs/faq.md
@ -74,16 +74,22 @@ which contains many benchmarking articles.
 ## Accessing other LLMs with OpenRouter
-[OpenRouter](https://openrouter.ai) provide an interface to [many models](https://openrouter.ai/docs) which are not widely accessible, in particular gpt-4-32k and claude-2.
+[OpenRouter](https://openrouter.ai) provide an interface to [many models](https://openrouter.ai/models) which are not widely accessible, in particular Claude 3 Opus.
-To access the openrouter models simply
+To access the OpenRouter models, simply:
- register for an account, purchase some credits and generate an api key
+```
- set `--openai-api-base https://openrouter.ai/api/v1`
+# Install Aider
- set `--openai-api-key` to your openrouter key
+pip install aider-chat
- set `--model` to the model of your choice (`openai/gpt-4-32k`, `anthropic/claude-2` etc.)
+
 # Setup openrouter access
 export OPENAI_API_KEY=<your-openrouter-key>
 export export OPENAI_API_BASE=https://openrouter.ai/api/v1
 # For example, run aider with Claude 3 Opus using the diff editing format
 aider --model anthropic/claude-3-opus --edit-format diff
 ```
 Some of the models weren't very functional and each llm has its own quirks. The anthropic models work ok, but the llama-2 ones in particular will need more work to play friendly with aider.
 ## Can I use aider with other LLMs, local LLMs, etc?