From eea35a3414e68d9d54b443b6141facd13c9d314b Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Tue, 9 Apr 2024 16:57:58 -0700 Subject: [PATCH] copy --- _posts/2024-04-09-gpt-4-turbo.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_posts/2024-04-09-gpt-4-turbo.md b/_posts/2024-04-09-gpt-4-turbo.md index 64c3ad8b8..08951672f 100644 --- a/_posts/2024-04-09-gpt-4-turbo.md +++ b/_posts/2024-04-09-gpt-4-turbo.md @@ -1,7 +1,7 @@ --- title: GPT-4 Turbo with Vision is a step backwards for coding -excerpt: OpenAI's new `gpt-4-turbo-2024-04-09` model scores worse on aider's code editing benchmarks than all the previous GPT-4 models. -highlight_image: /assets/2024-03-07-claude-3.svg +excerpt: OpenAI's new gpt-4-turbo-2024-04-09 model scores worse on aider's code editing benchmarks than all the previous GPT-4 models. +highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.svg --- # GPT-4 Turbo with Vision is a step backwards for coding @@ -26,9 +26,9 @@ For each exercise, the LLM gets two tries to solve each problem: 1. On the first try, it gets initial stub code and the English description of the coding task. If the tests all pass, we are done. 2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task. -GPT-4 Turbo with Vision +**GPT-4 Turbo with Vision scores only 62% on this benchmark, -the lowest score of any of the existing GPT-4 models. +the lowest score of any of the existing GPT-4 models.** The other models scored 63-66%, so this represents only a small regression, and is likely statistically insignificant when compared against `gpt-4-0613`. @@ -53,9 +53,9 @@ It consists of 89 python refactoring tasks which tend to make GPT-4 Turbo code in that lazy manner. -The new GPT-4 Turbo with Vision model scores only 33% on aider's +**The new GPT-4 Turbo with Vision model scores only 33% on aider's refactoring benchmark, making it the laziest coder of all the GPT-4 Turbo models -by a significant margin. +by a significant margin.** # Conclusions