copy

2025-05-30 17:24:59 +00:00 · 2024-05-23 20:57:11 -07:00 · 2024-05-23 20:57:11 -07:00 · c591ecd331
commit c591ecd331
parent bd56adf16f
1 changed files with 2 additions and 2 deletions
--- a/_posts/2024-05-22-swe-bench-lite.md
+++ b/_posts/2024-05-22-swe-bench-lite.md
@ -128,7 +128,7 @@ These first two attempts obtained ~75% of all plausible and ~90% of all resolved
 | **Total** | | **300** | **100%** | **79** | **100%** | **26.3%** |


-If we break down correct solutions purely by model,
+If we break down the solutions solely by model,
 we can see that aider with GPT-4o outperforms Opus.
 This isn't a fair and direct comparison, because GPT-4o always took the first
 turn and therefore got first crack at all the "easiest" problems.
@ -229,7 +229,7 @@ complete the edits specified by the LLM.
 This is usually because the LLM has failed to conform to the editing
 instructions in its system prompt.
 When aider completes, it returns an editing outcome that indicates
-whether it was able to successfully complete all edits.
+whether it was able to successfully apply all edits.
 The benchmark harness uses this editing status as
 one criteria to determine if aider has
 created a plausible solution.