This commit is contained in:
Paul Gauthier 2023-12-17 17:17:34 -08:00
parent 33c337f407
commit 330b4140c7

View file

@ -346,30 +346,27 @@ referencing old code like
Based on this observation, I set out to build a benchmark based on refactoring Based on this observation, I set out to build a benchmark based on refactoring
a non-trivial amount of code found in fairly large source files. a non-trivial amount of code found in fairly large source files.
To do this, I used python's `ast` module to analyze the To do this, I used python's `ast` module to analyze the
[Django repository](https://github.com/django/django). [Django repository](https://github.com/django/django) to:
The goal was to search the Django repository to:
- Find source files that contain class methods which are non-trivial, having more than 100 AST nodes in their implementation. - Find source files that contain class methods which are non-trivial, having more than 100 AST nodes in their implementation.
- Focus on methods that are part of a larger class. We want to find methods which are less than half the code present in their containing class. - Focus on methods that are only part of a larger class, which has at least twice as much code as the method.
- Find methods that do not make any use of their `self` parameter. This means they can be trivially refactored out of the class and turned into a stand-alone top-level function. - Find methods that don't use their `self` parameter, so they can be trivially refactored out of the class.
We can then turn each of these source files into a task for the benchmark, We can then turn each of these source files into a task for the benchmark,
using instructions like: where we ask GPT to:
> Refactor the `_set_csrf_cookie` method in the `CsrfViewMiddleware` class to be a stand alone, top level function. > Refactor the `_set_csrf_cookie` method in the `CsrfViewMiddleware` class to be a stand alone, top level function.
> Name the new function `_set_csrf_cookie`, exactly the same name as the existing method. > Name the new function `_set_csrf_cookie`, exactly the same name as the existing method.
> Update any existing `self._set_csrf_cookie` calls to work with the new `_set_csrf_cookie` function. > Update any existing `self._set_csrf_cookie` calls to work with the new `_set_csrf_cookie` function.
A [simple python AST scanning script]() found 39 of these source files in the Django repository A [simple python AST scanning script](https://github.com/paul-gauthier/aider/blob/main/benchmark/refactor_tools.py)
and packaged them up as benchmark tasks using found 39 of these source files
the same format as Exercism exercises. and packaged them up as benchmark tasks.
Each task has a test
The tool also created a unit test for each task which uses the `ast` module to check that the refactor
which again uses the `ast` module to check that the refactor
was performed roughly correctly: was performed roughly correctly:
- The updated source file must parse as correct python, without `SyntaxError` or `IndentationError` exceptions. This is a powerful check that will surface any mechanical errors made when attempting to edit the source code. - The updated source file must parse as valid python, to surface misapplied edits which corrupt the file.
- The target method must now exist as a top-level function in the file. - The target method must now exist as a top-level function in the file.
- This new top-level function must contain approximately the same number of AST nodes as the original class method. This ensures that GPT didn't elide code and replace it with comments. - This new top-level function must contain approximately the same number of AST nodes as the original class method. This ensures that GPT didn't elide code and replace it with comments.
- The original class must still be present in the file, and it must be smaller by about the number of AST nodes of the method which was removed. This helps confirm that the method was removed from the class, without other significant modifications. - The original class must still be present in the file, and it must be smaller by about the number of AST nodes of the method which was removed. This helps confirm that the method was removed from the class, without other significant modifications.
@ -391,11 +388,11 @@ Based on the refactor benchmark results,
aider's new unified diff format seems very effective at stopping aider's new unified diff format seems very effective at stopping
GPT-4 Turbo from being a lazy coder. GPT-4 Turbo from being a lazy coder.
Unified diffs were one of the very first edit formats I tried Unified diffs was one of the very first edit formats I tried
when first building aider. when originally building aider.
I think a lot of other AI coding assistant projects have also I think a lot of other AI coding assistant projects have also
tried going down this path. tried going down this path.
It seems that any naive or direct use of structure diff formats It seems like any naive or direct use of structured diff formats
is pretty much doomed to failure. is pretty much doomed to failure.
But the techniques described here and But the techniques described here and
incorporated into aider provide incorporated into aider provide