mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-30 17:24:59 +00:00
layout
This commit is contained in:
parent
9982a5b4e2
commit
08048b4972
3 changed files with 56 additions and 57 deletions
|
@ -1,55 +0,0 @@
|
|||
---
|
||||
title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
|
||||
excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
|
||||
highlight_image: /assets/benchmarks-speed-1106.svg
|
||||
redirect_from:
|
||||
- /docs/benchmarks-speed-1106.html
|
||||
---
|
||||
# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
|
||||
|
||||
<p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>
|
||||
|
||||
[](https://aider.chat/assets/benchmarks-speed-1106.svg)
|
||||
|
||||
[OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
|
||||
and there's a lot
|
||||
of interest about their capabilities and performance.
|
||||
With that in mind, I've been benchmarking the new models.
|
||||
|
||||
[Aider](https://github.com/paul-gauthier/aider)
|
||||
is an open source command line chat tool that lets you work with GPT to edit
|
||||
code in your local git repo.
|
||||
Aider relies on a
|
||||
[code editing benchmark](https://aider.chat/docs/benchmarks.html)
|
||||
to quantitatively evaluate
|
||||
performance.
|
||||
|
||||
This is the latest in a series of reports
|
||||
that use the aider benchmarking suite to assess and compare the code
|
||||
editing capabilities of OpenAI's GPT models. You can review previous
|
||||
reports to get more background on aider's benchmark suite:
|
||||
|
||||
- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
|
||||
- [Code editing skill benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html) compares the olders models to the November (1106) models.
|
||||
|
||||
## Speed
|
||||
|
||||
This report compares the **speed** of the various GPT models.
|
||||
Aider's benchmark measures the response time of the OpenAI chat completion
|
||||
endpoint each time it asks GPT to solve a programming exercise in the benchmark
|
||||
suite. These results measure only the time spent waiting for OpenAI to
|
||||
respond to the prompt.
|
||||
So they are measuring
|
||||
how fast these models can
|
||||
generate responses which primarily consist of source code.
|
||||
|
||||
Some observations:
|
||||
|
||||
- **GPT-3.5 got 6-11x faster.** The `gpt-3.5-turbo-1106` model is 6-11x faster than the June (0613) version which has been the default `gpt-3.5-turbo` model.
|
||||
- **GPT-4 Turbo is 2-2.5x faster.** The new `gpt-4-1106-preview` model is 2-2.5x faster than the June (0613) version which has been the default `gpt-4` model.
|
||||
- The old March (0301) version of GPT-3.5 is actually faster than the June (0613) version. This was a surprising discovery.
|
||||
|
||||
## Updates
|
||||
|
||||
Last updated 11/14/23.
|
||||
OpenAI has relaxed rate limits so these results are no longer considered preliminary.
|
1
_posts/2023-11-06-benchmarks-speed-1106.md
Symbolic link
1
_posts/2023-11-06-benchmarks-speed-1106.md
Symbolic link
|
@ -0,0 +1 @@
|
|||
../docs/benchmarks-speed-1106.md
|
|
@ -17,8 +17,6 @@
|
|||
margin-bottom: 2em;
|
||||
padding: 1em;
|
||||
border-radius: 4px;
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
}
|
||||
.post-date {
|
||||
color: #777;
|
||||
|
@ -33,6 +31,8 @@
|
|||
}
|
||||
|
||||
.post-content {
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
flex: 1;
|
||||
}
|
||||
|
||||
|
|
53
docs/benchmarks-speed-1106.md
Normal file
53
docs/benchmarks-speed-1106.md
Normal file
|
@ -0,0 +1,53 @@
|
|||
---
|
||||
title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
|
||||
excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
|
||||
highlight_image: /assets/benchmarks-speed-1106.svg
|
||||
---
|
||||
# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
|
||||
|
||||
<p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>
|
||||
|
||||
[](https://aider.chat/assets/benchmarks-speed-1106.svg)
|
||||
|
||||
[OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
|
||||
and there's a lot
|
||||
of interest about their capabilities and performance.
|
||||
With that in mind, I've been benchmarking the new models.
|
||||
|
||||
[Aider](https://github.com/paul-gauthier/aider)
|
||||
is an open source command line chat tool that lets you work with GPT to edit
|
||||
code in your local git repo.
|
||||
Aider relies on a
|
||||
[code editing benchmark](https://aider.chat/docs/benchmarks.html)
|
||||
to quantitatively evaluate
|
||||
performance.
|
||||
|
||||
This is the latest in a series of reports
|
||||
that use the aider benchmarking suite to assess and compare the code
|
||||
editing capabilities of OpenAI's GPT models. You can review previous
|
||||
reports to get more background on aider's benchmark suite:
|
||||
|
||||
- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
|
||||
- [Code editing skill benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html) compares the olders models to the November (1106) models.
|
||||
|
||||
## Speed
|
||||
|
||||
This report compares the **speed** of the various GPT models.
|
||||
Aider's benchmark measures the response time of the OpenAI chat completion
|
||||
endpoint each time it asks GPT to solve a programming exercise in the benchmark
|
||||
suite. These results measure only the time spent waiting for OpenAI to
|
||||
respond to the prompt.
|
||||
So they are measuring
|
||||
how fast these models can
|
||||
generate responses which primarily consist of source code.
|
||||
|
||||
Some observations:
|
||||
|
||||
- **GPT-3.5 got 6-11x faster.** The `gpt-3.5-turbo-1106` model is 6-11x faster than the June (0613) version which has been the default `gpt-3.5-turbo` model.
|
||||
- **GPT-4 Turbo is 2-2.5x faster.** The new `gpt-4-1106-preview` model is 2-2.5x faster than the June (0613) version which has been the default `gpt-4` model.
|
||||
- The old March (0301) version of GPT-3.5 is actually faster than the June (0613) version. This was a surprising discovery.
|
||||
|
||||
## Updates
|
||||
|
||||
Last updated 11/14/23.
|
||||
OpenAI has relaxed rate limits so these results are no longer considered preliminary.
|
Loading…
Add table
Add a link
Reference in a new issue