This commit is contained in:
Paul Gauthier 2023-12-19 15:30:15 -08:00
parent d36c18f9dc
commit 7028a533f1

View file

@ -3,7 +3,6 @@
![robot flowchart](../assets/benchmarks-udiff.svg) ![robot flowchart](../assets/benchmarks-udiff.svg)
Aider now asks GPT-4 Turbo to use Aider now asks GPT-4 Turbo to use
[unified diffs](#choose-a-familiar-editing-format) [unified diffs](#choose-a-familiar-editing-format)
to edit your code. to edit your code.
@ -29,9 +28,10 @@ This new laziness benchmark produced the following results with `gpt-4-1106-prev
- **Aider's new unified diff edit format raised the score to 61%**. Using this format reduced laziness by 3X, with GPT-4 Turbo only using lazy comments on 4 of the tasks. - **Aider's new unified diff edit format raised the score to 61%**. Using this format reduced laziness by 3X, with GPT-4 Turbo only using lazy comments on 4 of the tasks.
- **It's worse to add a prompt that the user is blind, has no hands, will tip $2000 and fears truncated code trauma.** - **It's worse to add a prompt that the user is blind, has no hands, will tip $2000 and fears truncated code trauma.**
The widely circulated "blind with no hands" type of folk remedies These widely circulated "emotional appeal" folk remedies
performed worse on the benchmark when added to the system prompt. produced worse benchmark scores.
The benchmark scores dropped Adding *all* of these claims to the system prompt
resulted in worse benchmark scores
for the baseline SEARCH/REPLACE and new unified diff editing formats. for the baseline SEARCH/REPLACE and new unified diff editing formats.
These prompts did somewhat reduce the amount of laziness when used These prompts did somewhat reduce the amount of laziness when used
with the SEARCH/REPLACE edit format, with the SEARCH/REPLACE edit format,