copy

2025-06-01 18:25:00 +00:00 · 2023-12-17 17:17:34 -08:00 · 2023-12-17 17:17:34 -08:00 · 330b4140c7
commit 330b4140c7
parent 33c337f407
1 changed files with 13 additions and 16 deletions
--- a/docs/unified-diffs.md
+++ b/docs/unified-diffs.md
@ -346,30 +346,27 @@ referencing old code like
 Based on this observation, I set out to build a benchmark based on refactoring
 a non-trivial amount of code found in fairly large source files.
 To do this, I used python's `ast` module to analyze the
-[Django repository](https://github.com/django/django).
+[Django repository](https://github.com/django/django) to:
 The goal was to search the Django repository to:
 - Find source files that contain class methods which are non-trivial, having more than 100 AST nodes in their implementation.
- Focus on methods that are part of a larger class. We want to find methods which are less than half the code present in their containing class.
+- Focus on methods that are only part of a larger class, which has at least twice as much code as the method.
- Find methods that do not make any use of their `self` parameter. This means they can be trivially refactored out of the class and turned into a stand-alone top-level function.
+- Find methods that don't use their `self` parameter, so they can be trivially refactored out of the class.
 We can then turn each of these source files into a task for the benchmark,
-using instructions like:
+where we ask GPT to:
 > Refactor the `_set_csrf_cookie` method in the `CsrfViewMiddleware` class to be a stand alone, top level function.
 > Name the new function `_set_csrf_cookie`, exactly the same name as the existing method.
 > Update any existing `self._set_csrf_cookie` calls to work with the new `_set_csrf_cookie` function.
-A [simple python AST scanning script]() found 39 of these source files in the Django repository
+A [simple python AST scanning script](https://github.com/paul-gauthier/aider/blob/main/benchmark/refactor_tools.py)
-and packaged them up as benchmark tasks using
+found 39 of these source files
-the same format as Exercism exercises.
+and packaged them up as benchmark tasks.
-
+Each task has a test
-The tool also created a unit test for each task
+which uses the `ast` module to check that the refactor
 which again uses the `ast` module to check that the refactor
 was performed roughly correctly:
- The updated source file must parse as correct python, without `SyntaxError` or `IndentationError` exceptions. This is a powerful check that will surface any mechanical errors made when attempting to edit the source code.
+- The updated source file must parse as valid python, to surface misapplied edits which corrupt the file.
 - The target method must now exist as a top-level function in the file.
 - This new top-level function must contain approximately the same number of AST nodes as the original class method. This ensures that GPT didn't elide code and replace it with comments.
 - The original class must still be present in the file, and it must be smaller by about the number of AST nodes of the method which was removed. This helps confirm that the method was removed from the class, without other significant modifications.
@ -391,11 +388,11 @@ Based on the refactor benchmark results,
 aider's new unified diff format seems very effective at stopping
 GPT-4 Turbo from being a lazy coder.
-Unified diffs were one of the very first edit formats I tried
+Unified diffs was one of the very first edit formats I tried
-when first building aider.
+when originally building aider.
 I think a lot of other AI coding assistant projects have also
 tried going down this path.
-It seems that any naive or direct use of structure diff formats
+It seems like any naive or direct use of structured diff formats
 is pretty much doomed to failure.
 But the techniques described here and
 incorporated into aider provide