copy

2025-06-01 18:25:00 +00:00 · 2024-05-22 18:26:04 -07:00 · 2024-05-22 18:26:04 -07:00 · 5e13399f46
commit 5e13399f46
parent 18e3f55c4e
1 changed files with 38 additions and 36 deletions
--- a/_posts/2024-05-22-swe-bench-lite.md
+++ b/_posts/2024-05-22-swe-bench-lite.md
@ -39,7 +39,7 @@ This lets them quickly steer misunderstandings back on course and
 avoid wasted time, code reviews and token costs.
-## Methodology
+## Benchmark methodology
 For the benchmark, 
 aider was launched in each problem's git repository
@ -61,18 +61,18 @@ It's important to be clear that during benchmarking
 It could not see or run the held out "acceptance tests" that are used later to see if the
 SWE Bench problem was correctly resolved.
-The benchmarking process can be thought of as similar to a user:
+The benchmarking process was similar to a user employing aider like this:
 - Launching aider in their repo with the something like command below, which
 tells aider to say yes to every suggestion and use pytest to run tests.
  - `aider --yes --test-cmd pytest`
 - Pasting the text of a GitHub issue into the chat, or adding it via URL with a command in the chat like:
  - `/web https://github.com/django/django/issues/XXX`
- If aider doesn't produce code that lints and tests clean, the user might decide to revert the changes and try again, maybe with a different LLM this time.
+- If aider doesn't produce code that lints and tests clean, the user might decide to revert the changes and try again, maybe using aider with a different LLM this time.
 [Aider is tightly integrated with git](https://aider.chat/docs/faq.html#how-does-aider-use-git),
 so it's always easy to revert AI changes that don't pan out.
-Of course, outside a benchmark setting, it's probably
+Outside a benchmark setting, it's probably
 unwise to let *any* AI agent run unsupervised on your code base.
 Aider is intended to be used as an interactive pair-programming chat,
 where the user participates to direct aider's work and approve suggestions.
@ -82,7 +82,7 @@ or if the AI starts going down a wrong path.
 ## Aider with GPT-4o alone was SOTA
-Running the entire SWE Bench Lite benchmark using aider with just GPT-4o
+Running the SWE Bench Lite benchmark using aider with just GPT-4o
 achieved a score of 25%.
 This was itself a state-of-the-art result, before being surpassed by the main
 result being reported here
@ -203,7 +203,7 @@ respected when new code is added.
 [Aider lints code](https://aider.chat/2024/05/22/linting.html)
 after every LLM edit and offers to automatically fix
 any linting errors.
-Aider includes basic linters built with tree-sitter that supports
+Aider includes basic linters built with tree-sitter to check
 [most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
 These built in linters will detect syntax errors and other fatal problems with the code.
@ -220,36 +220,38 @@ make the correct changes to resolve it.
 <div class="chat-transcript" markdown="1">
-> app.py:23:36: F821 undefined name 'num'  
+```
-> app.py:41:16: F541 f-string is missing placeholders  
+app.py:23:36: F821 undefined name 'num'  
->   
+app.py:41:16: F541 f-string is missing placeholders  
-> app.py:  
+  
-> ...⋮...  
+app.py:  
->   6│class LongNum:  
+...⋮...  
->   7│    def __init__(self, num):  
+  6│class LongNum:  
->   8│        """  
+  7│    def __init__(self, num):  
->   9│        Initialize the number.  
+  8│        """  
->  10│        """  
+  9│        Initialize the number.  
-> ...⋮...  
+ 10│        """  
->  19│    def __str__(self):  
+...⋮...  
->  20│        """  
+ 19│    def __str__(self):  
->  21│        Render the number as a string.  
+ 20│        """  
->  22│        """  
+ 21│        Render the number as a string.  
->  23█        return str(num)  
+ 22│        """  
->  24│  
+ 23█        return str(num)  
->  25│  
+ 24│  
->  26│@app.route('/subtract/<int:x>/<int:y>')  
+ 25│  
-> ...⋮...  
+ 26│@app.route('/subtract/<int:x>/<int:y>')  
->  38│@app.route('/divide/<int:x>/<int:y>')  
+...⋮...  
->  39│def divide(x, y):  
+ 38│@app.route('/divide/<int:x>/<int:y>')  
->  40│    if y == 0:  
+ 39│def divide(x, y):  
->  41█        return f"Error: Cannot divide by zero"  
+ 40│    if y == 0:  
->  42│    else:  
+ 41█        return f"Error: Cannot divide by zero"  
->  43│        result = x / y  
+ 42│    else:  
->  44│        return str(result)  
+ 43│        result = x / y  
->  45│  
+ 44│        return str(result)  
-> ...⋮...  
+ 45│  
->   
+...⋮...  
 ```  
 > Attempt to fix lint errors? yes
 </div>