copy

2025-05-28 00:05:01 +00:00 · 2023-06-30 14:33:45 -07:00 · 2023-06-30 14:33:45 -07:00 · 8c73a7be35
commit 8c73a7be35
parent 6610a8310c
1 changed files with 2 additions and 2 deletions
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
@ -35,8 +35,8 @@ I ran the benchmark
 on almost all the ChatGPT models, using a variety of edit formats.
 This produced some interesting observations:

-  - Asking GPT to just return an updated copy of the whole file in a normal fenced code block is by far the most reliable way to have it edit code. This is true across all gpt-3.5 and gpt-4 models. Keeping the output format dead simple seems to leave GPT with more brain power to devote to the actual coding task. GPT is also less likely to mangle this simple output format.
-  - Using the new function calling API is worse than returning whole files in markdown. GPT writes worse code and frequently mangles the output format, even though OpenAI introduced the function calling API to make structured output formatting more reliable. This was a big surprise.
+  - Asking GPT to just return an updated copy of the whole file in a normal fenced code block is by far the most reliable edit format. This is true across all gpt-3.5 and gpt-4 models. Keeping the output format dead simple seems to leave GPT with more brain power to devote to the actual coding task. GPT is also less likely to mangle this simple output format.
+  - Using the new function calling API is worse than the above whole file method, for all models. GPT writes worse code and frequently mangles this output format, even though OpenAI introduced the function calling API to make structured output formatting more reliable. This was a big surprise.
  - The new June (`0613`) versions of `gpt-3.5-turbo` are worse at code editing than the older Feb (`0301`) version. This was unexpected.
  - The gpt-4 models are much better at code editing than the gpt-3.5 models. This was expected.