Update benchmarks.md

This commit is contained in:
paul-gauthier 2023-07-02 08:34:17 -07:00 committed by GitHub
parent 93e29eda94
commit b3cda38a1a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -41,7 +41,7 @@ The results were interesting:
- The performance of the new June (`0613`) versions of GPT-3.5 appears to be a bit worse than the February (`0301`) version. This is visible if you look at the "first attempt" markers on the first three solid blue bars and also by comparing the first three solid green `diff` bars. - The performance of the new June (`0613`) versions of GPT-3.5 appears to be a bit worse than the February (`0301`) version. This is visible if you look at the "first attempt" markers on the first three solid blue bars and also by comparing the first three solid green `diff` bars.
- As expected, the GPT-4 models outperformed the GPT-3.5 models in code editing. - As expected, the GPT-4 models outperformed the GPT-3.5 models in code editing.
The quantitative benchmark results align with my intuitions The quantitative benchmark results agree with my intuitions
about prompting GPT for complex tasks like coding. It's beneficial to about prompting GPT for complex tasks like coding. It's beneficial to
minimize the "cognitive overhead" of formatting the response, allowing minimize the "cognitive overhead" of formatting the response, allowing
GPT to concentrate on the coding task at hand. GPT to concentrate on the coding task at hand.