copy

2025-05-28 16:25:00 +00:00 · 2023-06-30 13:41:32 -07:00 · 2023-06-30 13:41:32 -07:00 · af8a136ae3
commit af8a136ae3
parent 9ea1c0dee1
1 changed files with 4 additions and 4 deletions
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
@ -14,11 +14,13 @@ and can range from simply "return the whole file with edits" to
 [function calling API](https://openai.com/blog/function-calling-and-other-api-updates)
 to specify a bunch of specific diffs".

+Having a reliable way for GPT to read/modify/write source files is critical to
+efficiently coding with GPT within an existing codebase.
+
 To measure the impact of changes to the edit format,
 I created a code editing benchmark based on the
 [Exercism python](https://github.com/exercism/python)
 coding exercises.
-
 The benchmark measures how well aider & GPT can turn
 a human coding request into
 actual runnable code saved into files that passes unit tests.
@ -27,11 +29,9 @@ of not just how well GPT can write code, but also how well it
 can package up and format these code changes
 so that aider can save the edits to the
 local source files.
-Having a reliable way for GPT to read/modify/write source files is critical to
-efficiently coding with GPT within an existing codebase.

 I ran the benchmark
-across many different versions of the ChatGPT models using a variey of different edit formats.
+on almost all the ChatGPT models using different edit formats.
 This produced some interesting observations:

  - Asking GPT to just return an updated copy of the whole file as a fenced code block within it's normal markdown response is by far the most reliable way to have it edit code. This is true across all gpt-3.5 and gpt-4 models. Keeping the output format dead simple seems to leave GPT with more brain power to devote to the actual coding task. GPT is also less likely to mangle this simple output format.