mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-28 08:14:59 +00:00
copy
This commit is contained in:
parent
00f1cdb561
commit
eea35a3414
1 changed files with 6 additions and 6 deletions
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: GPT-4 Turbo with Vision is a step backwards for coding
|
||||
excerpt: OpenAI's new `gpt-4-turbo-2024-04-09` model scores worse on aider's code editing benchmarks than all the previous GPT-4 models.
|
||||
highlight_image: /assets/2024-03-07-claude-3.svg
|
||||
excerpt: OpenAI's new gpt-4-turbo-2024-04-09 model scores worse on aider's code editing benchmarks than all the previous GPT-4 models.
|
||||
highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.svg
|
||||
---
|
||||
# GPT-4 Turbo with Vision is a step backwards for coding
|
||||
|
||||
|
@ -26,9 +26,9 @@ For each exercise, the LLM gets two tries to solve each problem:
|
|||
1. On the first try, it gets initial stub code and the English description of the coding task. If the tests all pass, we are done.
|
||||
2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
|
||||
|
||||
GPT-4 Turbo with Vision
|
||||
**GPT-4 Turbo with Vision
|
||||
scores only 62% on this benchmark,
|
||||
the lowest score of any of the existing GPT-4 models.
|
||||
the lowest score of any of the existing GPT-4 models.**
|
||||
The other models scored 63-66%, so this represents only a small
|
||||
regression, and is likely statistically insignificant when compared
|
||||
against `gpt-4-0613`.
|
||||
|
@ -53,9 +53,9 @@ It consists of
|
|||
89 python refactoring tasks
|
||||
which tend to make GPT-4 Turbo code in that lazy manner.
|
||||
|
||||
The new GPT-4 Turbo with Vision model scores only 33% on aider's
|
||||
**The new GPT-4 Turbo with Vision model scores only 33% on aider's
|
||||
refactoring benchmark, making it the laziest coder of all the GPT-4 Turbo models
|
||||
by a significant margin.
|
||||
by a significant margin.**
|
||||
|
||||
# Conclusions
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue