copy

2025-05-29 00:35:00 +00:00 · 2023-07-01 13:02:30 -07:00 · 2023-07-01 13:02:30 -07:00 · 8f73f8b651
commit 8f73f8b651
parent 09a220f7fb
1 changed files with 9 additions and 5 deletions
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
@ -56,11 +56,13 @@ changes as `diff -c` formatted edits.
 Using more complex output formats seems to cause two problems:
  - It makes GPT write worse code. Keeping the output format simple seems to leave GPT with more attention to devote to the actual coding task.
-  - It makes GPT less likely to adhere to the output format. This makes it harder for tooling to correctly identify and apply the edits it is trying to make. 
+  - It makes GPT less likely to adhere to the output format. This makes it harder for tooling like aider to correctly identify and apply the edits GPT is trying to make. 
 I had hoped that the new function calling API would enable more reliable use of
-structured output formats, but it does not appear to be a panacea
+structured output formats, and expected to switch aider to using it
-when working with source code.
+for both GPT-3.5 and GPT-4.
 But given these benchmarking results, I won't be adopting the functions api
 at this time.
 More details on the benchmark, edit formats and results are discussed below.
@ -116,8 +118,10 @@ Many of the exercises have multiple paragraphs of instructions,
 and most human coders would likely fail some tests on their
 first try.
-It's worth noting that GPT never gets to see the source code of the unit tests.
+It's worth noting that GPT never gets to see the source code of the unit tests
 during the benchmarking.
 Just the error output from failed tests.
 Of course, all of this code was probably part of its original training data!
 In summary, passing an exercise means GPT was able to:
@ -261,7 +265,7 @@ Instead, GPT-3.5 frequently just stuffs an entire python
 file into that field.
 It feels like it might be getting confused by fine tuning that was done
-for the ChatGPT coder interpreter plugin?
+for the ChatGPT code interpreter plugin?
 ## Randomness