mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-29 16:54:59 +00:00
improved test for toplevel refactored func
This commit is contained in:
parent
7028a533f1
commit
76c1deae6a
2 changed files with 21 additions and 23 deletions
|
@ -21,25 +21,23 @@ class ParentNodeTransformer(ast.NodeTransformer):
|
||||||
|
|
||||||
|
|
||||||
def verify_full_func_at_top_level(tree, func, func_children):
|
def verify_full_func_at_top_level(tree, func, func_children):
|
||||||
func_node = next(
|
func_nodes = [
|
||||||
(
|
item for item in ast.walk(tree) if isinstance(item, ast.FunctionDef) and item.name == func
|
||||||
item
|
]
|
||||||
for item in ast.walk(tree)
|
assert func_nodes, f"Function {func} not found"
|
||||||
if isinstance(item, ast.FunctionDef) and item.name == func
|
|
||||||
),
|
|
||||||
None,
|
|
||||||
)
|
|
||||||
assert func_node is not None, f"Function {func} not found"
|
|
||||||
|
|
||||||
assert isinstance(
|
for func_node in func_nodes:
|
||||||
func_node.parent, ast.Module
|
if not isinstance(func_node.parent, ast.Module):
|
||||||
), f"{func} is not a top level function, it has parent {func_node.parent}"
|
continue
|
||||||
|
|
||||||
num_children = sum(1 for _ in ast.walk(func_node))
|
num_children = sum(1 for _ in ast.walk(func_node))
|
||||||
pct_diff_children = abs(num_children - func_children) * 100 / func_children
|
pct_diff_children = abs(num_children - func_children) * 100 / func_children
|
||||||
assert (
|
assert (
|
||||||
pct_diff_children < 10
|
pct_diff_children < 10
|
||||||
), f"Old method had {func_children} children, new method has {num_children}"
|
), f"Old method had {func_children} children, new method has {num_children}"
|
||||||
|
return
|
||||||
|
|
||||||
|
assert False, f"{func} is not a top level function"
|
||||||
|
|
||||||
|
|
||||||
def verify_old_class_children(tree, old_class, old_class_children):
|
def verify_old_class_children(tree, old_class, old_class_children):
|
||||||
|
|
|
@ -14,19 +14,19 @@ where it writes
|
||||||
code with comments
|
code with comments
|
||||||
like "...add logic here...".
|
like "...add logic here...".
|
||||||
|
|
||||||
Aider also has a new "laziness" benchmark suite
|
Aider's new "laziness" benchmark suite
|
||||||
designed to both provoke and quantify lazy coding.
|
designed to both provoke and quantify lazy coding.
|
||||||
It consists of
|
It consists of
|
||||||
89 python refactoring tasks
|
89 python refactoring tasks
|
||||||
which tend to make GPT-4 Turbo lazy
|
which tend to make GPT-4 Turbo lazy
|
||||||
and write comments like
|
and write comments like
|
||||||
"...include the original method body...".
|
"...include original method body...".
|
||||||
|
|
||||||
This new laziness benchmark produced the following results with `gpt-4-1106-preview`:
|
This new laziness benchmark produced the following results with `gpt-4-1106-preview`:
|
||||||
|
|
||||||
- **GPT-4 Turbo only scored 20% as a baseline** using aider's existing "SEARCH/REPLACE block" edit format. It outputs "lazy comments" on 12 of the tasks.
|
- **GPT-4 Turbo only scored 20% as a baseline** using aider's existing "SEARCH/REPLACE block" edit format. It outputs "lazy comments" on 12 of the tasks.
|
||||||
- **Aider's new unified diff edit format raised the score to 61%**. Using this format reduced laziness by 3X, with GPT-4 Turbo only using lazy comments on 4 of the tasks.
|
- **Aider's new unified diff edit format raised the score to 61%**. Using this format reduced laziness by 3X, with GPT-4 Turbo only using lazy comments on 4 of the tasks.
|
||||||
- **It's worse to add a prompt that the user is blind, has no hands, will tip $2000 and fears truncated code trauma.**
|
- **It's worse to add a prompt that says the user is blind, has no hands, will tip $2000 and fears truncated code trauma.**
|
||||||
|
|
||||||
These widely circulated "emotional appeal" folk remedies
|
These widely circulated "emotional appeal" folk remedies
|
||||||
produced worse benchmark scores.
|
produced worse benchmark scores.
|
||||||
|
@ -328,7 +328,7 @@ To do this, I used python's `ast` module to analyze
|
||||||
to identify challenging refactoring tasks.
|
to identify challenging refactoring tasks.
|
||||||
The goal was to find:
|
The goal was to find:
|
||||||
|
|
||||||
- Source files that contain class methods which are non-trivial, having 100-250+ AST nodes in their implementation.
|
- Source files that contain classes with non-trivial methods, having 100-250+ AST nodes in their implementation.
|
||||||
- Focus on methods that are part of a larger class, which has at least twice as much code as the method itself.
|
- Focus on methods that are part of a larger class, which has at least twice as much code as the method itself.
|
||||||
- Select methods that don't use their `self` parameter, so they can be trivially refactored out of the class.
|
- Select methods that don't use their `self` parameter, so they can be trivially refactored out of the class.
|
||||||
|
|
||||||
|
@ -343,10 +343,10 @@ A [simple python AST scanning script](https://github.com/paul-gauthier/aider/blo
|
||||||
found 89 suitable files
|
found 89 suitable files
|
||||||
and packaged them up as benchmark tasks.
|
and packaged them up as benchmark tasks.
|
||||||
Each task has a test
|
Each task has a test
|
||||||
that checks if refactor
|
that checks if the refactor
|
||||||
was performed roughly correctly:
|
was performed roughly correctly:
|
||||||
|
|
||||||
- The updated source file must parse as valid python, to surface misapplied edits which corrupt the file.
|
- The updated source file must parse as valid python, to detect misapplied edits which produce invalid code.
|
||||||
- The target method must now exist as a top-level function in the file.
|
- The target method must now exist as a top-level function in the file.
|
||||||
- This new top-level function must contain approximately the same number of AST nodes as the original class method. This ensures that GPT didn't elide code and replace it with comments.
|
- This new top-level function must contain approximately the same number of AST nodes as the original class method. This ensures that GPT didn't elide code and replace it with comments.
|
||||||
- The original class must still be present in the file, and it must be smaller by about the number of AST nodes in the method which was removed. This helps confirm that the method was removed from the class, without other significant modifications.
|
- The original class must still be present in the file, and it must be smaller by about the number of AST nodes in the method which was removed. This helps confirm that the method was removed from the class, without other significant modifications.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue