mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-31 17:55:01 +00:00
copy
This commit is contained in:
parent
7889a91e9d
commit
871bdc8c9a
1 changed files with 3 additions and 3 deletions
|
@ -162,7 +162,7 @@ The SWE Bench acceptance testing just confirms that tests pass or fail
|
||||||
in the same pattern as the "gold patch" developed by a human to solve the
|
in the same pattern as the "gold patch" developed by a human to solve the
|
||||||
problem.
|
problem.
|
||||||
Some tests may fail during acceptance testing,
|
Some tests may fail during acceptance testing,
|
||||||
and that's ok as long they failed for the gold
|
and that's ok as long as they failed for the gold
|
||||||
patch too.
|
patch too.
|
||||||
- There may have been pre-existing linting problems in the repo.
|
- There may have been pre-existing linting problems in the repo.
|
||||||
If lingering linting issues affected code paths that are not well tested,
|
If lingering linting issues affected code paths that are not well tested,
|
||||||
|
@ -200,7 +200,7 @@ This was the case for both this main SWE Bench result and the
|
||||||
earlier Lite result.
|
earlier Lite result.
|
||||||
|
|
||||||
The table below breaks down the benchmark outcome of each problem,
|
The table below breaks down the benchmark outcome of each problem,
|
||||||
show whether aider with GPT-4o and with Opus
|
showing whether aider with GPT-4o and with Opus
|
||||||
produced plausible and/or correct solutions.
|
produced plausible and/or correct solutions.
|
||||||
|
|
||||||
|Row|Aider<br>w/GPT-4o<br>solution<br>plausible?|Aider<br>w/GPT-4o<br>solution<br>resolved<br>issue?|Aider<br>w/Opus<br>solution<br>plausible?|Aider<br>w/Opus<br>solution<br>resolved<br>issue?|Number of<br>problems<br>with this<br>outcome|
|
|Row|Aider<br>w/GPT-4o<br>solution<br>plausible?|Aider<br>w/GPT-4o<br>solution<br>resolved<br>issue?|Aider<br>w/Opus<br>solution<br>plausible?|Aider<br>w/Opus<br>solution<br>resolved<br>issue?|Number of<br>problems<br>with this<br>outcome|
|
||||||
|
@ -304,7 +304,7 @@ Table 2 of their
|
||||||
reports an `ACR-avg` result of 10.59% which is an average pass@1 result.
|
reports an `ACR-avg` result of 10.59% which is an average pass@1 result.
|
||||||
|
|
||||||
The results presented here for aider are all pass@1, as
|
The results presented here for aider are all pass@1, as
|
||||||
the [official SWE Bench Lite leaderboard](https://www.swebench.com)
|
the [official SWE Bench leaderboard](https://www.swebench.com)
|
||||||
only accepts pass@1 results.
|
only accepts pass@1 results.
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue