mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-31 17:55:01 +00:00
copy
This commit is contained in:
parent
cb63b61411
commit
6acc3689e5
2 changed files with 6 additions and 15 deletions
|
@ -71,9 +71,3 @@ The comments below only focus on comparing the `whole` edit format results:
|
|||
- The new `gpt-3.5-turbo-1106` model is completing the benchmark **3-4X faster** than the earlier GPT-3.5 models.
|
||||
- The success rate after the first try of 42% is comparable to the previous June (0613) model. The new November and previous June models are both worse than the original March (0301) model's 50% result on the first try.
|
||||
- The new model's 56% success rate after the second try seems comparable to the original March model, and somewhat better than the June model's 50% score.
|
||||
|
||||
|
||||
|
||||
### Updates
|
||||
|
||||
I will update the results on this page as quickly as my rate limit allows.
|
||||
|
|
|
@ -15,8 +15,8 @@ Aider relies on a
|
|||
to quantitatively evaluate
|
||||
performance.
|
||||
|
||||
This is the latest in a series of benchmarking reports
|
||||
about the code
|
||||
This is the latest in a series of reports
|
||||
that use the aider benchmarking suite to assess and compare the code
|
||||
editing capabilities of OpenAI's GPT models. You can review previous
|
||||
reports to get more background on aider's benchmark suite:
|
||||
|
||||
|
@ -44,13 +44,10 @@ Some observations:
|
|||
OpenAI is enforcing very low
|
||||
rate limits on the new GPT-4 model.
|
||||
The rate limiting disrupts the the benchmarking process,
|
||||
requiring it to be run single threaded, paused and restarted frequently.
|
||||
requiring it to run single threaded, pause and restart frequently.
|
||||
These anomolous conditions make it slow to
|
||||
benchmark the new model, and make comparisons against
|
||||
the older versions less reliable.
|
||||
benchmark the new model, and make
|
||||
it less reliable to compare the results with
|
||||
benchmark runs against the older model versions.
|
||||
Once the rate limits are relaxed I will do a clean
|
||||
run of the entire benchmark suite.
|
||||
|
||||
### Updates
|
||||
|
||||
I will update the results on this page as quickly as my rate limit allows.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue