mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-02 02:34:59 +00:00
copy
This commit is contained in:
parent
cb63b61411
commit
6acc3689e5
2 changed files with 6 additions and 15 deletions
|
@ -71,9 +71,3 @@ The comments below only focus on comparing the `whole` edit format results:
|
||||||
- The new `gpt-3.5-turbo-1106` model is completing the benchmark **3-4X faster** than the earlier GPT-3.5 models.
|
- The new `gpt-3.5-turbo-1106` model is completing the benchmark **3-4X faster** than the earlier GPT-3.5 models.
|
||||||
- The success rate after the first try of 42% is comparable to the previous June (0613) model. The new November and previous June models are both worse than the original March (0301) model's 50% result on the first try.
|
- The success rate after the first try of 42% is comparable to the previous June (0613) model. The new November and previous June models are both worse than the original March (0301) model's 50% result on the first try.
|
||||||
- The new model's 56% success rate after the second try seems comparable to the original March model, and somewhat better than the June model's 50% score.
|
- The new model's 56% success rate after the second try seems comparable to the original March model, and somewhat better than the June model's 50% score.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Updates
|
|
||||||
|
|
||||||
I will update the results on this page as quickly as my rate limit allows.
|
|
||||||
|
|
|
@ -15,8 +15,8 @@ Aider relies on a
|
||||||
to quantitatively evaluate
|
to quantitatively evaluate
|
||||||
performance.
|
performance.
|
||||||
|
|
||||||
This is the latest in a series of benchmarking reports
|
This is the latest in a series of reports
|
||||||
about the code
|
that use the aider benchmarking suite to assess and compare the code
|
||||||
editing capabilities of OpenAI's GPT models. You can review previous
|
editing capabilities of OpenAI's GPT models. You can review previous
|
||||||
reports to get more background on aider's benchmark suite:
|
reports to get more background on aider's benchmark suite:
|
||||||
|
|
||||||
|
@ -44,13 +44,10 @@ Some observations:
|
||||||
OpenAI is enforcing very low
|
OpenAI is enforcing very low
|
||||||
rate limits on the new GPT-4 model.
|
rate limits on the new GPT-4 model.
|
||||||
The rate limiting disrupts the the benchmarking process,
|
The rate limiting disrupts the the benchmarking process,
|
||||||
requiring it to be run single threaded, paused and restarted frequently.
|
requiring it to run single threaded, pause and restart frequently.
|
||||||
These anomolous conditions make it slow to
|
These anomolous conditions make it slow to
|
||||||
benchmark the new model, and make comparisons against
|
benchmark the new model, and make
|
||||||
the older versions less reliable.
|
it less reliable to compare the results with
|
||||||
|
benchmark runs against the older model versions.
|
||||||
Once the rate limits are relaxed I will do a clean
|
Once the rate limits are relaxed I will do a clean
|
||||||
run of the entire benchmark suite.
|
run of the entire benchmark suite.
|
||||||
|
|
||||||
### Updates
|
|
||||||
|
|
||||||
I will update the results on this page as quickly as my rate limit allows.
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue