copy

2025-05-31 01:35:00 +00:00 · 2023-11-06 19:15:01 -08:00 · 2023-11-06 19:15:01 -08:00 · a6027721c1
commit a6027721c1
parent 2675ed5e87
2 changed files with 57 additions and 74 deletions
--- a/docs/benchmarks-1106.md
+++ b/docs/benchmarks-1106.md
@ -35,13 +35,13 @@ With that in mind, I've been benchmarking the new models.
 ## gpt-4-1106-preview

 - The new `gpt-4-1106-preview` model seems **much faster** than the earlier GPT-4 models! I won't be able to properly quantify this until the rate limits loosen up. Currently I am seeing 10X faster responses.
- **It is better at producing correct code on the first try**. It gets ~57% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
+- **It is better at producing correct code on the first try**. It gets ~59% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
 - The new model seems to perform similarly to the old models after being given a chance to correct bugs by reviewing test suite error output.

 **These results are preliminiary.**
 OpenAI is enforcing very low
 rate limits on the new GPT-4 model. The limits are so low, that
-I have only been able to attempt 53 out of 133 exercism problems.
+I have only been able to attempt 56 out of 133 exercism problems.
 They are randomly chosen, so results should be *roughly*
 indicative of the full benchmark.