From 2d91ee8dbb14165e1c2c2b81d712d2f37345acec Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Sat, 4 May 2024 17:33:25 -0700 Subject: [PATCH] copy --- docs/leaderboards/index.md | 6 +++--- docs/llms.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/leaderboards/index.md b/docs/leaderboards/index.md index 5e30a19fb..0ea71cf6e 100644 --- a/docs/leaderboards/index.md +++ b/docs/leaderboards/index.md @@ -11,8 +11,8 @@ to successfully edit code. Aider uses two benchmarks to measure an LLM's code editing ability: -- The [code editing benchmark](/docs/benchmarks.html#the-benchmark) asks the LLM to edit python source files to complete 133 small coding exercises. This benchmark measures the LLM's coding ability, but also whether it can consistently emit code edits in the format specified in the system prompt. -- The [refactoring benchmark](https://github.com/paul-gauthier/refactor-benchmark) asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections or making mistakes. It was developed to provoke and measure [GPT-4 Turbo's "lazy coding" habit](/2023/12/21/unified-diffs.html). +- [Aider's code editing benchmark](/docs/benchmarks.html#the-benchmark) asks the LLM to edit python source files to complete 133 small coding exercises. This benchmark measures the LLM's coding ability, but also whether it can consistently emit code edits in the format specified in the system prompt. +- [Aider's refactoring benchmark](https://github.com/paul-gauthier/refactor-benchmark) asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections or making mistakes. It was developed to provoke and measure [GPT-4 Turbo's "lazy coding" habit](/2023/12/21/unified-diffs.html). The leaderboards below report the results from a number of popular LLMs, to help users select which models to use with aider. @@ -173,6 +173,6 @@ since it is the easiest format for an LLM to use. Contributions of benchmark results are welcome! See the [benchmark README](https://github.com/paul-gauthier/aider/blob/main/benchmark/README.md) -for information on running aider's benchmarks. +for information on running aider's code editing benchmark. Submit results by opening a PR with edits to the [benchmark results CSV data files](https://github.com/paul-gauthier/aider/blob/main/_data/). diff --git a/docs/llms.md b/docs/llms.md index 4218d1862..9c377a77f 100644 --- a/docs/llms.md +++ b/docs/llms.md @@ -28,7 +28,7 @@ local models that provide an ## Use a capable model Check -[Aider's LLM leaderboard](https://aider.chat/docs/leaderboard.html) +[Aider's LLM leaderboards](https://aider.chat/docs/leaderboards/) to see which models work best with aider. Be aware that aider may not work well with less capable models.