This commit is contained in:
Paul Gauthier 2023-11-07 18:18:02 -08:00
parent c55aff87e6
commit c86a957cf5
2 changed files with 62 additions and 58 deletions

View file

@ -45,15 +45,19 @@ This is the edit format that aider uses by default with gpt-4.
- The new `gpt-4-1106-preview` model seems **much faster** than the earlier GPT-4 models. I won't be able to properly quantify this until the rate limits loosen up.
- **It seems better at producing correct code on the first try**. It gets
~57% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
~54% of the coding exercises correct, without needing to see errors from the test suite. Previous models only get 46-47% of the exercises correct on the first try.
- The new model seems to perform similar
(~66%) to the old models (63-64%) after being given a second chance to correct bugs by reviewing test suite error output.
(~63%) to the old models (63-64%) after their second chance to correct bugs by reviewing test suite error output.
**These are preliminary results.**
OpenAI is enforcing very low
rate limits on the new GPT-4 model. The limits are so low, that
I have only been able to attempt
113
rate limits on the new GPT-4 model.
The rate limiting is disrupting the normal flow of the benchmarking process,
which needs to be restarted after pauses.
The benchmarking tool is capable of such restarts, but
I will trust a "clean" run much better once the rate limits are relaxed.
The results currently reflect
130
out of the 133 Exercism problems.
The problems are selected in random order, so results should be *roughly*
indicative of the full benchmark.