mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-01 18:25:00 +00:00
copy
This commit is contained in:
parent
18e3f55c4e
commit
5e13399f46
1 changed files with 38 additions and 36 deletions
|
@ -39,7 +39,7 @@ This lets them quickly steer misunderstandings back on course and
|
||||||
avoid wasted time, code reviews and token costs.
|
avoid wasted time, code reviews and token costs.
|
||||||
|
|
||||||
|
|
||||||
## Methodology
|
## Benchmark methodology
|
||||||
|
|
||||||
For the benchmark,
|
For the benchmark,
|
||||||
aider was launched in each problem's git repository
|
aider was launched in each problem's git repository
|
||||||
|
@ -61,18 +61,18 @@ It's important to be clear that during benchmarking
|
||||||
It could not see or run the held out "acceptance tests" that are used later to see if the
|
It could not see or run the held out "acceptance tests" that are used later to see if the
|
||||||
SWE Bench problem was correctly resolved.
|
SWE Bench problem was correctly resolved.
|
||||||
|
|
||||||
The benchmarking process can be thought of as similar to a user:
|
The benchmarking process was similar to a user employing aider like this:
|
||||||
|
|
||||||
- Launching aider in their repo with the something like command below, which
|
- Launching aider in their repo with the something like command below, which
|
||||||
tells aider to say yes to every suggestion and use pytest to run tests.
|
tells aider to say yes to every suggestion and use pytest to run tests.
|
||||||
- `aider --yes --test-cmd pytest`
|
- `aider --yes --test-cmd pytest`
|
||||||
- Pasting the text of a GitHub issue into the chat, or adding it via URL with a command in the chat like:
|
- Pasting the text of a GitHub issue into the chat, or adding it via URL with a command in the chat like:
|
||||||
- `/web https://github.com/django/django/issues/XXX`
|
- `/web https://github.com/django/django/issues/XXX`
|
||||||
- If aider doesn't produce code that lints and tests clean, the user might decide to revert the changes and try again, maybe with a different LLM this time.
|
- If aider doesn't produce code that lints and tests clean, the user might decide to revert the changes and try again, maybe using aider with a different LLM this time.
|
||||||
[Aider is tightly integrated with git](https://aider.chat/docs/faq.html#how-does-aider-use-git),
|
[Aider is tightly integrated with git](https://aider.chat/docs/faq.html#how-does-aider-use-git),
|
||||||
so it's always easy to revert AI changes that don't pan out.
|
so it's always easy to revert AI changes that don't pan out.
|
||||||
|
|
||||||
Of course, outside a benchmark setting, it's probably
|
Outside a benchmark setting, it's probably
|
||||||
unwise to let *any* AI agent run unsupervised on your code base.
|
unwise to let *any* AI agent run unsupervised on your code base.
|
||||||
Aider is intended to be used as an interactive pair-programming chat,
|
Aider is intended to be used as an interactive pair-programming chat,
|
||||||
where the user participates to direct aider's work and approve suggestions.
|
where the user participates to direct aider's work and approve suggestions.
|
||||||
|
@ -82,7 +82,7 @@ or if the AI starts going down a wrong path.
|
||||||
|
|
||||||
## Aider with GPT-4o alone was SOTA
|
## Aider with GPT-4o alone was SOTA
|
||||||
|
|
||||||
Running the entire SWE Bench Lite benchmark using aider with just GPT-4o
|
Running the SWE Bench Lite benchmark using aider with just GPT-4o
|
||||||
achieved a score of 25%.
|
achieved a score of 25%.
|
||||||
This was itself a state-of-the-art result, before being surpassed by the main
|
This was itself a state-of-the-art result, before being surpassed by the main
|
||||||
result being reported here
|
result being reported here
|
||||||
|
@ -203,7 +203,7 @@ respected when new code is added.
|
||||||
[Aider lints code](https://aider.chat/2024/05/22/linting.html)
|
[Aider lints code](https://aider.chat/2024/05/22/linting.html)
|
||||||
after every LLM edit and offers to automatically fix
|
after every LLM edit and offers to automatically fix
|
||||||
any linting errors.
|
any linting errors.
|
||||||
Aider includes basic linters built with tree-sitter that supports
|
Aider includes basic linters built with tree-sitter to check
|
||||||
[most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
|
[most popular programming languages](https://github.com/paul-gauthier/grep-ast/blob/main/grep_ast/parsers.py).
|
||||||
These built in linters will detect syntax errors and other fatal problems with the code.
|
These built in linters will detect syntax errors and other fatal problems with the code.
|
||||||
|
|
||||||
|
@ -220,36 +220,38 @@ make the correct changes to resolve it.
|
||||||
|
|
||||||
<div class="chat-transcript" markdown="1">
|
<div class="chat-transcript" markdown="1">
|
||||||
|
|
||||||
> app.py:23:36: F821 undefined name 'num'
|
```
|
||||||
> app.py:41:16: F541 f-string is missing placeholders
|
app.py:23:36: F821 undefined name 'num'
|
||||||
>
|
app.py:41:16: F541 f-string is missing placeholders
|
||||||
> app.py:
|
|
||||||
> ...⋮...
|
app.py:
|
||||||
> 6│class LongNum:
|
...⋮...
|
||||||
> 7│ def __init__(self, num):
|
6│class LongNum:
|
||||||
> 8│ """
|
7│ def __init__(self, num):
|
||||||
> 9│ Initialize the number.
|
8│ """
|
||||||
> 10│ """
|
9│ Initialize the number.
|
||||||
> ...⋮...
|
10│ """
|
||||||
> 19│ def __str__(self):
|
...⋮...
|
||||||
> 20│ """
|
19│ def __str__(self):
|
||||||
> 21│ Render the number as a string.
|
20│ """
|
||||||
> 22│ """
|
21│ Render the number as a string.
|
||||||
> 23█ return str(num)
|
22│ """
|
||||||
> 24│
|
23█ return str(num)
|
||||||
> 25│
|
24│
|
||||||
> 26│@app.route('/subtract/<int:x>/<int:y>')
|
25│
|
||||||
> ...⋮...
|
26│@app.route('/subtract/<int:x>/<int:y>')
|
||||||
> 38│@app.route('/divide/<int:x>/<int:y>')
|
...⋮...
|
||||||
> 39│def divide(x, y):
|
38│@app.route('/divide/<int:x>/<int:y>')
|
||||||
> 40│ if y == 0:
|
39│def divide(x, y):
|
||||||
> 41█ return f"Error: Cannot divide by zero"
|
40│ if y == 0:
|
||||||
> 42│ else:
|
41█ return f"Error: Cannot divide by zero"
|
||||||
> 43│ result = x / y
|
42│ else:
|
||||||
> 44│ return str(result)
|
43│ result = x / y
|
||||||
> 45│
|
44│ return str(result)
|
||||||
> ...⋮...
|
45│
|
||||||
>
|
...⋮...
|
||||||
|
```
|
||||||
|
|
||||||
> Attempt to fix lint errors? yes
|
> Attempt to fix lint errors? yes
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue