copy

2025-05-27 15:55:00 +00:00 · 2023-11-07 05:24:07 -08:00 · 2023-11-07 05:24:07 -08:00 · 433dca5b80
commit 433dca5b80
parent beb026cc25
2 changed files with 83 additions and 77 deletions
--- a/docs/benchmarks-1106.md
+++ b/docs/benchmarks-1106.md
@ -40,15 +40,15 @@ The benchmark gives aider two tries to complete the task:

 ### gpt-4-1106-preview

- The new `gpt-4-1106-preview` model seems **much faster** than the earlier GPT-4 models! I won't be able to properly quantify this until the rate limits loosen up. Currently I am seeing 10X faster responses.
- **It is much better at producing correct code on the first try**. It gets ~60% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
- The new model seems to perform somewhat better (69%) than the old models (63-64%) after being given a second chance to correct bugs by reviewing test suite error output.
+- The new `gpt-4-1106-preview` model seems **much faster** than the earlier GPT-4 models. I won't be able to properly quantify this until the rate limits loosen up.
+- **It seems better at producing correct code on the first try**. It gets ~56% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
+- The new model seems to perform similar (66%) to the old models (63-64%) after being given a second chance to correct bugs by reviewing test suite error output.

 **These results are preliminiary.**
 OpenAI is enforcing very low
 rate limits on the new GPT-4 model. The limits are so low, that
 I have only been able to attempt
-58
+79
 out of 133 exercism problems.
 The problems are selected in random order, so results should be *roughly*
 indicative of the full benchmark.