layout

2025-05-30 17:24:59 +00:00 · 2024-02-19 14:49:23 -08:00 · 2024-02-19 14:49:23 -08:00 · 08048b4972
commit 08048b4972
parent 9982a5b4e2
3 changed files with 56 additions and 57 deletions
--- a/_posts/2023-11-06-benchmarks-speed-1106.md
+++ b/_posts/2023-11-06-benchmarks-speed-1106.md
@ -1,55 +0,0 @@
---
-title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
-excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
-highlight_image: /assets/benchmarks-speed-1106.svg
-redirect_from:
-  - /docs/benchmarks-speed-1106.html
---
-# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
-
-<p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>
-
-[![benchmark results](/assets/benchmarks-speed-1106.svg)](https://aider.chat/assets/benchmarks-speed-1106.svg)
-
-[OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
-and there's a lot
-of interest about their capabilities and performance.
-With that in mind, I've been benchmarking the new models.
-
-[Aider](https://github.com/paul-gauthier/aider)
-is an open source command line chat tool that lets you work with GPT to edit
-code in your local git repo.
-Aider relies on a
-[code editing benchmark](https://aider.chat/docs/benchmarks.html)
-to quantitatively evaluate
-performance.
-
-This is the latest in a series of reports
-that use the aider benchmarking suite to assess and compare the code
-editing capabilities of OpenAI's GPT models. You can review previous
-reports to get more background on aider's benchmark suite:
-
- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
- [Code editing skill benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html) compares the olders models to the November (1106) models.
-
-## Speed
-
-This report compares the **speed** of the various GPT models.
-Aider's benchmark measures the response time of the OpenAI chat completion
-endpoint each time it asks GPT to solve a programming exercise in the benchmark
-suite. These results measure only the time spent waiting for OpenAI to
-respond to the prompt.
-So they are measuring
-how fast these models can
-generate responses which primarily consist of source code.
-
-Some observations:
-
- **GPT-3.5 got 6-11x faster.** The `gpt-3.5-turbo-1106` model is 6-11x faster than the June (0613) version which has been the default `gpt-3.5-turbo` model.
- **GPT-4 Turbo is 2-2.5x faster.** The new `gpt-4-1106-preview` model is 2-2.5x faster than the June (0613) version which has been the default `gpt-4` model.
- The old March (0301) version of GPT-3.5 is actually faster than the June (0613) version. This was a surprising discovery.
-
-## Updates
-
-Last updated 11/14/23.
-OpenAI has relaxed rate limits so these results are no longer considered preliminary.
--- a/_posts/2023-11-06-benchmarks-speed-1106.md
+++ b/_posts/2023-11-06-benchmarks-speed-1106.md
@ -0,0 +1 @@
+../docs/benchmarks-speed-1106.md
--- a/assets/css/style.scss
+++ b/assets/css/style.scss
@ -17,8 +17,6 @@
  margin-bottom: 2em;
  padding: 1em;
  border-radius: 4px;
-  display: flex;
-  align-items: flex-start;
 }
 .post-date {
  color: #777;
@ -33,6 +31,8 @@
 }

 .post-content {
+  display: flex;
+  align-items: flex-start;
  flex: 1;
 }

--- a/docs/benchmarks-speed-1106.md
+++ b/docs/benchmarks-speed-1106.md
@ -0,0 +1,53 @@
+---
+title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
+excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
+highlight_image: /assets/benchmarks-speed-1106.svg
+---
+# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
+
+<p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>
+
+[![benchmark results](/assets/benchmarks-speed-1106.svg)](https://aider.chat/assets/benchmarks-speed-1106.svg)
+
+[OpenAI just released new versions of GPT-3.5 and GPT-4](https://openai.com/blog/new-models-and-developer-products-announced-at-devday),
+and there's a lot
+of interest about their capabilities and performance.
+With that in mind, I've been benchmarking the new models.
+
+[Aider](https://github.com/paul-gauthier/aider)
+is an open source command line chat tool that lets you work with GPT to edit
+code in your local git repo.
+Aider relies on a
+[code editing benchmark](https://aider.chat/docs/benchmarks.html)
+to quantitatively evaluate
+performance.
+
+This is the latest in a series of reports
+that use the aider benchmarking suite to assess and compare the code
+editing capabilities of OpenAI's GPT models. You can review previous
+reports to get more background on aider's benchmark suite:
+
+- [GPT code editing benchmarks](https://aider.chat/docs/benchmarks.html) evaluates the March and June versions of GPT-3.5 and GPT-4.
+- [Code editing skill benchmarks for OpenAI's "1106" models](https://aider.chat/docs/benchmarks-1106.html) compares the olders models to the November (1106) models.
+
+## Speed
+
+This report compares the **speed** of the various GPT models.
+Aider's benchmark measures the response time of the OpenAI chat completion
+endpoint each time it asks GPT to solve a programming exercise in the benchmark
+suite. These results measure only the time spent waiting for OpenAI to
+respond to the prompt.
+So they are measuring
+how fast these models can
+generate responses which primarily consist of source code.
+
+Some observations:
+
+- **GPT-3.5 got 6-11x faster.** The `gpt-3.5-turbo-1106` model is 6-11x faster than the June (0613) version which has been the default `gpt-3.5-turbo` model.
+- **GPT-4 Turbo is 2-2.5x faster.** The new `gpt-4-1106-preview` model is 2-2.5x faster than the June (0613) version which has been the default `gpt-4` model.
+- The old March (0301) version of GPT-3.5 is actually faster than the June (0613) version. This was a surprising discovery.
+
+## Updates
+
+Last updated 11/14/23.
+OpenAI has relaxed rate limits so these results are no longer considered preliminary.