diff --git a/docs/benchmarks-1106.md b/docs/benchmarks-1106.md index 78bed5760..a907aefeb 100644 --- a/docs/benchmarks-1106.md +++ b/docs/benchmarks-1106.md @@ -71,3 +71,15 @@ The comments below only focus on comparing the `whole` edit format results: - The new `gpt-3.5-turbo-1106` model is completing the benchmark **3-4X faster** than the earlier GPT-3.5 models. - The success rate after the first try of 42% is comparable to the previous June (0613) model. The new November and previous June models are both worse than the original March (0301) model's 50% result on the first try. - The new model's 56% success rate after the second try seems comparable to the original March model, and somewhat better than the June model's 50% score. + + +## Related reports + +This is one in a series of reports +that use the aider benchmarking suite to assess and compare the code +editing capabilities of OpenAI's GPT models. +You can review the other reports +for additional information: + +- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4. +- [Code editing speed benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-speed-1106.html) compares the performance of the new GPT models.