mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-02 10:45:00 +00:00
copy
This commit is contained in:
parent
871bdc8c9a
commit
7623b8e2e6
4 changed files with 148 additions and 209 deletions
|
@ -23,7 +23,7 @@ that was reported recently.
|
|||
[](https://aider.chat/assets/swe_bench.svg)
|
||||
|
||||
Aider was benchmarked on 570 of the 2294 SWE Bench problems.
|
||||
These are the same
|
||||
These were the same
|
||||
[randomly selected 570 problems](https://github.com/CognitionAI/devin-swebench-results/tree/main/output_diffs) that
|
||||
[Devin used in their evaluation](https://www.cognition.ai/post/swe-bench-technical-report).
|
||||
Please see the [references](#references)
|
||||
|
@ -251,7 +251,7 @@ In these cases aider with Opus was unable to produce any solutions.
|
|||
|
||||
## Computing the benchmark score
|
||||
|
||||
Benchmarking produced one proposed solution for each of
|
||||
The benchmark harness produced one proposed solution for each of
|
||||
the 570 SWE Bench problems.
|
||||
|
||||
A separate evaluation script was used to
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue