copy

2025-05-29 08:44:59 +00:00 · 2024-04-09 16:57:58 -07:00 · 2024-04-09 16:57:58 -07:00 · eea35a3414
commit eea35a3414
parent 00f1cdb561
1 changed files with 6 additions and 6 deletions
--- a/_posts/2024-04-09-gpt-4-turbo.md
+++ b/_posts/2024-04-09-gpt-4-turbo.md
@ -1,7 +1,7 @@
 ---
 title: GPT-4 Turbo with Vision is a step backwards for coding
-excerpt: OpenAI's new `gpt-4-turbo-2024-04-09` model scores worse on aider's code editing benchmarks than all the previous GPT-4 models.
+excerpt: OpenAI's new gpt-4-turbo-2024-04-09 model scores worse on aider's code editing benchmarks than all the previous GPT-4 models.
-highlight_image: /assets/2024-03-07-claude-3.svg
+highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.svg
 ---
 # GPT-4 Turbo with Vision is a step backwards for coding
@ -26,9 +26,9 @@ For each exercise, the LLM gets two tries to solve each problem:
 1. On the first try, it gets initial stub code and the English description of the coding task. If the tests all pass, we are done.
 2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
-GPT-4 Turbo with Vision
+**GPT-4 Turbo with Vision
 scores only 62% on this benchmark,
-the lowest score of any of the existing GPT-4 models.
+the lowest score of any of the existing GPT-4 models.**
 The other models scored 63-66%, so this represents only a small
 regression, and is likely statistically insignificant when compared
 against `gpt-4-0613`.
@ -53,9 +53,9 @@ It consists of
 89 python refactoring tasks
 which tend to make GPT-4 Turbo code in that lazy manner.
-The new GPT-4 Turbo with Vision model scores only 33% on aider's
+**The new GPT-4 Turbo with Vision model scores only 33% on aider's
 refactoring benchmark, making it the laziest coder of all the GPT-4 Turbo models
-by a significant margin.
+by a significant margin.**
 # Conclusions