diff --git a/website/_data/refactor_leaderboard.yml b/website/_data/refactor_leaderboard.yml index 11773ac39..8a4aacfda 100644 --- a/website/_data/refactor_leaderboard.yml +++ b/website/_data/refactor_leaderboard.yml @@ -143,25 +143,25 @@ seconds_per_case: 67.8 total_cost: 20.4889 - -- dirname: 2024-06-20-16-39-18--refac-claude-3.5-sonnet-diff +- dirname: 2024-07-01-18-30-33--refac-claude-3.5-sonnet-diff-not-lazy test_cases: 89 model: claude-3.5-sonnet (diff) edit_format: diff - commit_hash: e5e07f9 - pass_rate_1: 55.1 - percent_cases_well_formed: 70.8 - error_outputs: 240 - num_malformed_responses: 54 - num_with_malformed_responses: 26 - user_asks: 10 + commit_hash: 7396e38-dirty + pass_rate_1: 64.0 + percent_cases_well_formed: 76.4 + error_outputs: 176 + num_malformed_responses: 39 + num_with_malformed_responses: 21 + user_asks: 11 lazy_comments: 2 - syntax_errors: 0 - indentation_errors: 3 + syntax_errors: 4 + indentation_errors: 0 exhausted_context_windows: 0 test_timeouts: 0 command: aider --model openrouter/anthropic/claude-3.5-sonnet - date: 2024-06-20 - versions: 0.38.1-dev - seconds_per_case: 51.9 - total_cost: 0.0000 \ No newline at end of file + date: 2024-07-01 + versions: 0.40.7-dev + seconds_per_case: 42.8 + total_cost: 11.5242 + \ No newline at end of file diff --git a/website/_posts/2024-07-01-sonnet-not-lazy.md b/website/_posts/2024-07-01-sonnet-not-lazy.md index 8593dcacb..86fcbb3dc 100644 --- a/website/_posts/2024-07-01-sonnet-not-lazy.md +++ b/website/_posts/2024-07-01-sonnet-not-lazy.md @@ -1,26 +1,34 @@ --- title: Sonnet is the opposite of lazy excerpt: Claude 3.5 Sonnet represents a step change in AI coding. -#highlight_image: /assets/linting.jpg -draft: true +highlight_image: /assets/sonnet-not-lazy.jpg nav_exclude: true --- + +[![sonnet is the opposite of lazy](/assets/sonnet-not-lazy.jpg)](https://aider.chat/assets/sonnet-not-lazy.jpg) + {% if page.date %}

{{ page.date | date: "%B %d, %Y" }}

{% endif %} - # Sonnet is the opposite of lazy -[![sonnet is the opposite of lazy](/assets/sonnet-not-lazy.jpg)](https://aider.chat/assets/sonnet-not-lazy.jpg) - Claude 3.5 Sonnet represents a step change in AI coding. It is so industrious, diligent and hard working that it has caused multiple problems for aider. + It's been worth the effort to adapt aider to work well with Sonnet, because the result is surprisingly powerful. +Sonnet's score on +[aider's refactoring benchmark](https://aider.chat/docs/leaderboards/#code-refactoring-leaderboard) +jumped from 55.1% up to 64.0% +as a result of the changes discussed below. +This moved Sonnet into second place, ahead of GPT-4o and +behind only Opus. + +## Problems Sonnet's amazing work ethic caused a few problems: @@ -31,7 +39,7 @@ on API responses, which truncates its coding in mid-stream. 2. Similarly, Sonnet can specify large sequences of edits in one go, like changing a majority of lines while refactoring a large file. Again, this regularly triggered the 4k output limit -and resulted in a failed edits. +and resulted in failed edits. 3. Sonnet is not shy about quoting large chunks of an existing file to perform a SEARCH & REPLACE edit across a long span of lines. @@ -57,7 +65,7 @@ Problem (3) does cause some real downsides. Faced with a few small changes spread far apart in a source file, Sonnet would often prefer to do one giant SEARCH/REPLACE -operation of the ~entire file. +operation of almost the entire file. This wastes a tremendous amount of tokens, time and money -- and risks hitting the 4k output limit. It would be far faster and less expensive to instead @@ -76,13 +84,16 @@ has specialized support for Claude 3.5 Sonnet: - Aider allows Sonnet to produce as much code as it wants, by automatically and seamlessly spreading the response out over a sequence of 4k token API responses. -- Aider carefully prompts Sonnet to be concise and -return only changing sections of code. +- Aider carefully prompts Sonnet to be concise when proposing +code edits. This reduces Sonnet's tendency to waste time, tokens and money returning large chunks of unchanging code. -- Aider now uses `claude-3-5-sonnet-20240620` by default if `ANTHROPIC_API_KEY` is set in the environment. +- Aider now uses Claude 3.5 Sonnet by default if the `ANTHROPIC_API_KEY` is set in the environment. -You can use aider with Sonnet like this: +See +[aider's install instructions](https://aider.chat/docs/install.html) +for more details, but +you can get started quickly with aider and Sonnet like this: ``` pip install aider-chat