From b3cda38a1a816912ae81c3accb7a88f2f3fbd5dd Mon Sep 17 00:00:00 2001 From: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com> Date: Sun, 2 Jul 2023 08:34:17 -0700 Subject: [PATCH] Update benchmarks.md --- docs/benchmarks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/benchmarks.md b/docs/benchmarks.md index d5121d8c7..f8e42c4f3 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -41,7 +41,7 @@ The results were interesting: - The performance of the new June (`0613`) versions of GPT-3.5 appears to be a bit worse than the February (`0301`) version. This is visible if you look at the "first attempt" markers on the first three solid blue bars and also by comparing the first three solid green `diff` bars. - As expected, the GPT-4 models outperformed the GPT-3.5 models in code editing. -The quantitative benchmark results align with my intuitions +The quantitative benchmark results agree with my intuitions about prompting GPT for complex tasks like coding. It's beneficial to minimize the "cognitive overhead" of formatting the response, allowing GPT to concentrate on the coding task at hand.