copy

2025-05-31 17:55:01 +00:00 · 2023-11-08 11:11:47 -08:00 · 2023-11-08 11:11:47 -08:00 · 6acc3689e5
commit 6acc3689e5
parent cb63b61411
2 changed files with 6 additions and 15 deletions
--- a/docs/benchmarks-1106.md
+++ b/docs/benchmarks-1106.md
@ -71,9 +71,3 @@ The comments below only focus on comparing the `whole` edit format results:
 - The new `gpt-3.5-turbo-1106` model is completing the benchmark **3-4X faster** than the earlier GPT-3.5 models.
 - The success rate after the first try of 42% is comparable to the previous June (0613) model. The new November and previous June models are both worse than the original March (0301) model's 50% result on the first try.
 - The new model's 56% success rate after the second try seems comparable to the original March model, and somewhat better than the June model's 50% score.
-
-
-
-### Updates
-
-I will update the results on this page as quickly as my rate limit allows.
--- a/docs/benchmarks-speed-1106.md
+++ b/docs/benchmarks-speed-1106.md
@ -15,8 +15,8 @@ Aider relies on a
 to quantitatively evaluate
 performance.

-This is the latest in a series of benchmarking reports
-about the code
+This is the latest in a series of reports
+that use the aider benchmarking suite to assess and compare the code
 editing capabilities of OpenAI's GPT models. You can review previous
 reports to get more background on aider's benchmark suite:

@ -44,13 +44,10 @@ Some observations:
 OpenAI is enforcing very low
 rate limits on the new GPT-4 model.
 The rate limiting disrupts the the benchmarking process,
-requiring it to be run single threaded, paused and restarted frequently.
+requiring it to run single threaded, pause and restart frequently.
 These anomolous conditions make it slow to
-benchmark the new model, and make comparisons against
-the older versions less reliable.
+benchmark the new model, and make
+it less reliable to compare the results with
+benchmark runs against the older model versions.
 Once the rate limits are relaxed I will do a clean
 run of the entire benchmark suite.
-
-### Updates
-
-I will update the results on this page as quickly as my rate limit allows.