layout

2025-06-01 02:05:00 +00:00 · 2024-02-19 14:49:23 -08:00 · 2024-02-19 14:49:23 -08:00 · 08048b4972
commit 08048b4972
parent 9982a5b4e2
3 changed files with 56 additions and 57 deletions
--- a/_posts/2023-11-06-benchmarks-speed-1106.md
+++ b/_posts/2023-11-06-benchmarks-speed-1106.md
@ -1,55 +0,0 @@
 ---
 title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
 excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
 highlight_image: /assets/benchmarks-speed-1106.svg
 redirect_from:
  - /docs/benchmarks-speed-1106.html
 ---
 # Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
 <p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>
 [![benchmark results](/assets/benchmarks-speed-1106.svg)](https://aider.chat/assets/benchmarks-speed-1106.svg)
 [OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
 and there's a lot
 of interest about their capabilities and performance.
 With that in mind, I've been benchmarking the new models.
 [Aider](https://github.com/paul-gauthier/aider)
 is an open source command line chat tool that lets you work with GPT to edit
 code in your local git repo.
 Aider relies on a
 [code editing benchmark](https://aider.chat/docs/benchmarks.html)
 to quantitatively evaluate
 performance.
 This is the latest in a series of reports
 that use the aider benchmarking suite to assess and compare the code
 editing capabilities of OpenAI's GPT models. You can review previous
 reports to get more background on aider's benchmark suite:
 - [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
 - [Code editing skill benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html) compares the olders models to the November (1106) models.
 ## Speed
 This report compares the **speed** of the various GPT models.
 Aider's benchmark measures the response time of the OpenAI chat completion
 endpoint each time it asks GPT to solve a programming exercise in the benchmark
 suite. These results measure only the time spent waiting for OpenAI to
 respond to the prompt.
 So they are measuring
 how fast these models can
 generate responses which primarily consist of source code.
 Some observations:
 - **GPT-3.5 got 6-11x faster.** The `gpt-3.5-turbo-1106` model is 6-11x faster than the June (0613) version which has been the default `gpt-3.5-turbo` model.
 - **GPT-4 Turbo is 2-2.5x faster.** The new `gpt-4-1106-preview` model is 2-2.5x faster than the June (0613) version which has been the default `gpt-4` model.
 - The old March (0301) version of GPT-3.5 is actually faster than the June (0613) version. This was a surprising discovery.
 ## Updates
 Last updated 11/14/23.
 OpenAI has relaxed rate limits so these results are no longer considered preliminary.
--- a/_posts/2023-11-06-benchmarks-speed-1106.md
+++ b/_posts/2023-11-06-benchmarks-speed-1106.md
@ -0,0 +1 @@
 ../docs/benchmarks-speed-1106.md
--- a/assets/css/style.scss
+++ b/assets/css/style.scss
@ -17,8 +17,6 @@
  margin-bottom: 2em;
  padding: 1em;
  border-radius: 4px;
  display: flex;
  align-items: flex-start;
 }
 .post-date {
  color: #777;
@ -33,6 +31,8 @@
 }
 .post-content {
  display: flex;
  align-items: flex-start;
  flex: 1;
 }
--- a/docs/benchmarks-speed-1106.md
+++ b/docs/benchmarks-speed-1106.md
@ -0,0 +1,53 @@
 ---
 title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
 excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
 highlight_image: /assets/benchmarks-speed-1106.svg
 ---
 # Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
 <p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>
 [![benchmark results](/assets/benchmarks-speed-1106.svg)](https://aider.chat/assets/benchmarks-speed-1106.svg)
 [OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
 and there's a lot
 of interest about their capabilities and performance.
 With that in mind, I've been benchmarking the new models.
 [Aider](https://github.com/paul-gauthier/aider)
 is an open source command line chat tool that lets you work with GPT to edit
 code in your local git repo.
 Aider relies on a
 [code editing benchmark](https://aider.chat/docs/benchmarks.html)
 to quantitatively evaluate
 performance.
 This is the latest in a series of reports
 that use the aider benchmarking suite to assess and compare the code
 editing capabilities of OpenAI's GPT models. You can review previous
 reports to get more background on aider's benchmark suite:
 - [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
 - [Code editing skill benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html) compares the olders models to the November (1106) models.
 ## Speed
 This report compares the **speed** of the various GPT models.
 Aider's benchmark measures the response time of the OpenAI chat completion
 endpoint each time it asks GPT to solve a programming exercise in the benchmark
 suite. These results measure only the time spent waiting for OpenAI to
 respond to the prompt.
 So they are measuring
 how fast these models can
 generate responses which primarily consist of source code.
 Some observations:
 - **GPT-3.5 got 6-11x faster.** The `gpt-3.5-turbo-1106` model is 6-11x faster than the June (0613) version which has been the default `gpt-3.5-turbo` model.
 - **GPT-4 Turbo is 2-2.5x faster.** The new `gpt-4-1106-preview` model is 2-2.5x faster than the June (0613) version which has been the default `gpt-4` model.
 - The old March (0301) version of GPT-3.5 is actually faster than the June (0613) version. This was a surprising discovery.
 ## Updates
 Last updated 11/14/23.
 OpenAI has relaxed rate limits so these results are no longer considered preliminary.