This commit is contained in:
Paul Gauthier 2024-06-03 11:14:17 -07:00
parent 8a8f3936f4
commit 0d06364db6
2 changed files with 4 additions and 4 deletions

View file

@ -15,7 +15,7 @@ from Amazon Q Developer Agent.
[![SWE Bench Lite results](/assets/swe_bench_lite.svg)](https://aider.chat/assets/swe_bench_lite.svg)
**To be clear, all of aider's results reported here are pass@1 results,
**All of aider's results reported here are pass@1 results,
obtained without using the SWE Bench `hints_text`.**
All results in the above chart are unhinted pass@1 results.
Please see the [references](#references)
@ -407,7 +407,7 @@ making it faster, easier, and more reliable to run the acceptance tests.
## References
To be clear, all of aider's results reported here are pass@1 results,
All of aider's results reported here are pass@1 results,
obtained without using the SWE Bench `hints_text`.
The "aider agent" internally makes multiple "attempts" at solving the problem,

View file

@ -20,7 +20,7 @@ This result on the main SWE Bench builds on
[![SWE Bench results](/assets/swe_bench.svg)](https://aider.chat/assets/swe_bench.svg)
**To be clear, all of aider's results reported here are pass@1 results,
**All of aider's results reported here are pass@1 results,
obtained without using the SWE Bench `hints_text`.**
Aider was benchmarked on the same
[570 randomly selected SWE Bench problems](https://github.com/CognitionAI/devin-swebench-results/tree/main/output_diffs)
@ -231,7 +231,7 @@ making it faster, easier, and more reliable to run the acceptance tests.
## References
To be clear, all of aider's results reported here are pass@1 results,
All of aider's results reported here are pass@1 results,
obtained without using the SWE Bench `hints_text`.
The "aider agent" internally makes multiple "attempts" at solving the problem,