copy

2025-05-30 17:24:59 +00:00 · 2024-12-23 08:00:25 -05:00 · 2024-12-23 08:00:25 -05:00 · 87a964355b
commit 87a964355b
parent fbc3f0cef5
2 changed files with 4 additions and 4 deletions
--- a/aider/website/_data/polyglot_leaderboard.yml
+++ b/aider/website/_data/polyglot_leaderboard.yml
@ -78,7 +78,7 @@

 - dirname: 2024-12-21-19-23-03--polyglot-o1-hard-diff
  test_cases: 224
-  model: o1-2024-12-17
+  model: o1-2024-12-17 (high)
  edit_format: diff
  commit_hash: a755079-dirty
  pass_rate_1: 23.7
--- a/benchmark/README.md
+++ b/benchmark/README.md
@ -2,18 +2,18 @@
 # Aider benchmark harness

 Aider uses benchmarks to quantitatively measure how well it works
-various LLMs.
+with various LLMs.
 This directory holds the harness and tools needed to run the benchmarking suite.

 ## Background

 The benchmark is based on the [Exercism](https://github.com/exercism/python) coding exercises.
 This
-benchmark evaluates how effectively aider and GPT can translate a
+benchmark evaluates how effectively aider and LLMs can translate a
 natural language coding request into executable code saved into
 files that pass unit tests.
 It provides an end-to-end evaluation of not just
-GPT's coding ability, but also its capacity to *edit existing code*
+the LLM's coding ability, but also its capacity to *edit existing code*
 and *format those code edits* so that aider can save the
 edits to the local source files.