This commit is contained in:
Paul Gauthier 2023-11-08 11:34:34 -08:00
parent 200414cee0
commit 7c92370a3b

View file

@ -71,3 +71,15 @@ The comments below only focus on comparing the `whole` edit format results:
- The new `gpt-3.5-turbo-1106` model is completing the benchmark **3-4X faster** than the earlier GPT-3.5 models.
- The success rate after the first try of 42% is comparable to the previous June (0613) model. The new November and previous June models are both worse than the original March (0301) model's 50% result on the first try.
- The new model's 56% success rate after the second try seems comparable to the original March model, and somewhat better than the June model's 50% score.
## Related reports
This is one in a series of reports
that use the aider benchmarking suite to assess and compare the code
editing capabilities of OpenAI's GPT models.
You can review the other reports
for additional information:
- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
- [Code editing speed benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-speed-1106.html) compares the performance of the new GPT models.