This commit is contained in:
Paul Gauthier 2023-12-20 13:13:44 -04:00
parent 8a3bde342f
commit 97fd65c6ce

View file

@ -28,9 +28,9 @@ This new laziness benchmark produced the following results with `gpt-4-1106-prev
- **Aider's new unified diff edit format raised the score to 61%**. Using this format reduced laziness by 3X, with GPT-4 Turbo only using lazy comments on 4 of the tasks.
- **It's worse to add a prompt that says the user is blind, has no hands, will tip $2000 and fears truncated code trauma.**
These widely circulated "emotional appeal" folk remedies
Widely circulated "emotional appeal" folk remedies
produced worse benchmark scores.
Adding *all* of these claims to the system prompt
Adding *all* of the various emotional statements to the system prompt
resulted in worse benchmark scores
for the baseline SEARCH/REPLACE and new unified diff editing formats.
These prompts did somewhat reduce the amount of laziness when used