This commit is contained in:
Paul Gauthier 2024-12-23 08:00:25 -05:00
parent fbc3f0cef5
commit 87a964355b
2 changed files with 4 additions and 4 deletions

View file

@ -78,7 +78,7 @@
- dirname: 2024-12-21-19-23-03--polyglot-o1-hard-diff
test_cases: 224
model: o1-2024-12-17
model: o1-2024-12-17 (high)
edit_format: diff
commit_hash: a755079-dirty
pass_rate_1: 23.7

View file

@ -2,18 +2,18 @@
# Aider benchmark harness
Aider uses benchmarks to quantitatively measure how well it works
various LLMs.
with various LLMs.
This directory holds the harness and tools needed to run the benchmarking suite.
## Background
The benchmark is based on the [Exercism](https://github.com/exercism/python) coding exercises.
This
benchmark evaluates how effectively aider and GPT can translate a
benchmark evaluates how effectively aider and LLMs can translate a
natural language coding request into executable code saved into
files that pass unit tests.
It provides an end-to-end evaluation of not just
GPT's coding ability, but also its capacity to *edit existing code*
the LLM's coding ability, but also its capacity to *edit existing code*
and *format those code edits* so that aider can save the
edits to the local source files.