From 51bc71446eac8cde65511e06243f4e903ff86f9f Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Sat, 1 Jul 2023 15:21:21 -0700 Subject: [PATCH] copy --- docs/benchmarks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/benchmarks.md b/docs/benchmarks.md index 45f3a2df1..0f34d2971 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -43,7 +43,7 @@ The results were quite interesting: - Using the new function calling API performed worse than the above whole file method for all models. GPT-3.5 especially produced inferior code and frequently mangled this output format. This was surprising, as the functions API was introduced to enhance the reliability of structured outputs. The results from these `func` edit methods are shown as patterned bars in the graph (both green and blue). - As expected, the GPT-4 models outperformed the GPT-3.5 models in code editing. -The quantitative benchmark results align with my developing intuition +The quantitative benchmark results align with my intuitions about prompting GPT for complex tasks like coding. It's beneficial to minimize the "cognitive overhead" of formatting the response, allowing GPT to concentrate on the task at hand. As an analogy, asking a junior