From ea1239efefa659ef413f89cffdc82d074a63e7f6 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Sat, 12 Apr 2025 23:40:53 -0700 Subject: [PATCH] Docs: Clarify polyglot benchmark measures edits without intervention. --- aider/website/docs/leaderboards/index.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/aider/website/docs/leaderboards/index.md b/aider/website/docs/leaderboards/index.md index aeb9324bc..e6a3ed3ae 100644 --- a/aider/website/docs/leaderboards/index.md +++ b/aider/website/docs/leaderboards/index.md @@ -9,14 +9,14 @@ has_children: true # Aider LLM Leaderboards Aider excels with LLMs skilled at *editing* code, not just writing it. -These benchmarks evaluate an LLM's ability to follow instructions and edit code successfully. -The leaderboards show results for popular LLMs. Aider works best with high-scoring models, though it [can connect to almost any LLM](/docs/llms.html). +These benchmarks evaluate an LLM's ability to follow instructions and edit code successfully without +human intervention. +Aider works best with high-scoring models, though it [can connect to almost any LLM](/docs/llms.html). ## Polyglot leaderboard [Aider's polyglot benchmark](https://aider.chat/2024/12/21/polyglot.html#the-polyglot-benchmark) tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust. -It measures coding ability in multiple languages, integration with existing code, and successful application of changes without human intervention. @@ -25,7 +25,7 @@ It measures coding ability in multiple languages, integration with existing code Model Percent correct - Cost
(log scale) + Cost (log scale) Command Edit format