mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-30 17:24:59 +00:00
copy
This commit is contained in:
parent
fbc3f0cef5
commit
87a964355b
2 changed files with 4 additions and 4 deletions
|
@ -78,7 +78,7 @@
|
|||
|
||||
- dirname: 2024-12-21-19-23-03--polyglot-o1-hard-diff
|
||||
test_cases: 224
|
||||
model: o1-2024-12-17
|
||||
model: o1-2024-12-17 (high)
|
||||
edit_format: diff
|
||||
commit_hash: a755079-dirty
|
||||
pass_rate_1: 23.7
|
||||
|
|
|
@ -2,18 +2,18 @@
|
|||
# Aider benchmark harness
|
||||
|
||||
Aider uses benchmarks to quantitatively measure how well it works
|
||||
various LLMs.
|
||||
with various LLMs.
|
||||
This directory holds the harness and tools needed to run the benchmarking suite.
|
||||
|
||||
## Background
|
||||
|
||||
The benchmark is based on the [Exercism](https://github.com/exercism/python) coding exercises.
|
||||
This
|
||||
benchmark evaluates how effectively aider and GPT can translate a
|
||||
benchmark evaluates how effectively aider and LLMs can translate a
|
||||
natural language coding request into executable code saved into
|
||||
files that pass unit tests.
|
||||
It provides an end-to-end evaluation of not just
|
||||
GPT's coding ability, but also its capacity to *edit existing code*
|
||||
the LLM's coding ability, but also its capacity to *edit existing code*
|
||||
and *format those code edits* so that aider can save the
|
||||
edits to the local source files.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue