mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-24 22:34:59 +00:00
copy
This commit is contained in:
parent
3e639639d5
commit
837fd9e30b
1 changed files with 7 additions and 5 deletions
|
@ -7,7 +7,9 @@
|
||||||
Aider now asks GPT-4 Turbo to use
|
Aider now asks GPT-4 Turbo to use
|
||||||
[unified diffs](#choose-a-familiar-editing-format)
|
[unified diffs](#choose-a-familiar-editing-format)
|
||||||
to edit your code.
|
to edit your code.
|
||||||
This dramatically improves GPT-4 Turbo's performance on a complex benchmark
|
This dramatically improves GPT-4 Turbo's performance on a
|
||||||
|
challenging
|
||||||
|
new benchmark
|
||||||
and significantly reduces its bad habit of "lazy" coding,
|
and significantly reduces its bad habit of "lazy" coding,
|
||||||
where it writes
|
where it writes
|
||||||
code with comments
|
code with comments
|
||||||
|
@ -17,15 +19,15 @@ Aider also has a new "laziness" benchmark suite
|
||||||
designed to both provoke and quantify lazy coding.
|
designed to both provoke and quantify lazy coding.
|
||||||
It consists of
|
It consists of
|
||||||
89 python refactoring tasks
|
89 python refactoring tasks
|
||||||
which tend to make GPT-4 Turbo very lazy.
|
which tend to make GPT-4 Turbo lazy
|
||||||
On these tasks it often produces comments like
|
and write comments like
|
||||||
"...include the original method body...".
|
"...include the original method body...".
|
||||||
|
|
||||||
This new laziness benchmark produced the following results with `gpt-4-1106-preview`:
|
This new laziness benchmark produced the following results with `gpt-4-1106-preview`:
|
||||||
|
|
||||||
- **GPT-4 Turbo only scored 20% as a baseline** using aider's existing "SEARCH/REPLACE block" edit format. It output "lazy comments" on 12 of the tasks.
|
- **GPT-4 Turbo only scored 20% as a baseline** using aider's existing "SEARCH/REPLACE block" edit format. It outputs "lazy comments" on 12 of the tasks.
|
||||||
- **Aider's new unified diff edit format raised the score to 61%**. Using this format reduced laziness by 3X, with GPT-4 Turbo only using lazy comments on 4 of the tasks.
|
- **Aider's new unified diff edit format raised the score to 61%**. Using this format reduced laziness by 3X, with GPT-4 Turbo only using lazy comments on 4 of the tasks.
|
||||||
- **It's worse to prompt that the user is blind, without hands, will tip $2000 and fears truncated code trauma.** These widely circulated folk remedies performed worse on the benchmark when added to the system prompt for the baseline SEARCH/REPLACE and new unified diff editing formats. These prompts did *slightly* reduce the amount of laziness, but at a large cost to successful benchmark outcomes.
|
- **It's worse to prompt that the user is blind, without hands, will tip $2000 and fears truncated code trauma.** These widely circulated folk remedies performed worse on the benchmark when added to the system prompt for the baseline SEARCH/REPLACE and new unified diff editing formats. These prompts did slightly reduce the amount of laziness against baseline (to 8 lazy tasks). It increased the lazy tasks to 5 when added to the unified diff prompt.
|
||||||
|
|
||||||
The older `gpt-4-0613` also did better on the laziness benchmark using unified diffs:
|
The older `gpt-4-0613` also did better on the laziness benchmark using unified diffs:
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue