From 6610a8310cb5edcb50566a5d1b50bd5df00ed167 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Fri, 30 Jun 2023 14:31:36 -0700 Subject: [PATCH] copy --- docs/benchmarks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/benchmarks.md b/docs/benchmarks.md index b31b57f1f..2253e149b 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -35,7 +35,7 @@ I ran the benchmark on almost all the ChatGPT models, using a variety of edit formats. This produced some interesting observations: - - Asking GPT to just return an updated copy of the whole file as a fenced code block within it's normal markdown response is by far the most reliable way to have it edit code. This is true across all gpt-3.5 and gpt-4 models. Keeping the output format dead simple seems to leave GPT with more brain power to devote to the actual coding task. GPT is also less likely to mangle this simple output format. + - Asking GPT to just return an updated copy of the whole file in a normal fenced code block is by far the most reliable way to have it edit code. This is true across all gpt-3.5 and gpt-4 models. Keeping the output format dead simple seems to leave GPT with more brain power to devote to the actual coding task. GPT is also less likely to mangle this simple output format. - Using the new function calling API is worse than returning whole files in markdown. GPT writes worse code and frequently mangles the output format, even though OpenAI introduced the function calling API to make structured output formatting more reliable. This was a big surprise. - The new June (`0613`) versions of `gpt-3.5-turbo` are worse at code editing than the older Feb (`0301`) version. This was unexpected. - The gpt-4 models are much better at code editing than the gpt-3.5 models. This was expected.