mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-29 08:44:59 +00:00
copy
This commit is contained in:
parent
00f1cdb561
commit
eea35a3414
1 changed files with 6 additions and 6 deletions
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
title: GPT-4 Turbo with Vision is a step backwards for coding
|
title: GPT-4 Turbo with Vision is a step backwards for coding
|
||||||
excerpt: OpenAI's new `gpt-4-turbo-2024-04-09` model scores worse on aider's code editing benchmarks than all the previous GPT-4 models.
|
excerpt: OpenAI's new gpt-4-turbo-2024-04-09 model scores worse on aider's code editing benchmarks than all the previous GPT-4 models.
|
||||||
highlight_image: /assets/2024-03-07-claude-3.svg
|
highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.svg
|
||||||
---
|
---
|
||||||
# GPT-4 Turbo with Vision is a step backwards for coding
|
# GPT-4 Turbo with Vision is a step backwards for coding
|
||||||
|
|
||||||
|
@ -26,9 +26,9 @@ For each exercise, the LLM gets two tries to solve each problem:
|
||||||
1. On the first try, it gets initial stub code and the English description of the coding task. If the tests all pass, we are done.
|
1. On the first try, it gets initial stub code and the English description of the coding task. If the tests all pass, we are done.
|
||||||
2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
|
2. If any tests failed, aider sends the LLM the failing test output and gives it a second try to complete the task.
|
||||||
|
|
||||||
GPT-4 Turbo with Vision
|
**GPT-4 Turbo with Vision
|
||||||
scores only 62% on this benchmark,
|
scores only 62% on this benchmark,
|
||||||
the lowest score of any of the existing GPT-4 models.
|
the lowest score of any of the existing GPT-4 models.**
|
||||||
The other models scored 63-66%, so this represents only a small
|
The other models scored 63-66%, so this represents only a small
|
||||||
regression, and is likely statistically insignificant when compared
|
regression, and is likely statistically insignificant when compared
|
||||||
against `gpt-4-0613`.
|
against `gpt-4-0613`.
|
||||||
|
@ -53,9 +53,9 @@ It consists of
|
||||||
89 python refactoring tasks
|
89 python refactoring tasks
|
||||||
which tend to make GPT-4 Turbo code in that lazy manner.
|
which tend to make GPT-4 Turbo code in that lazy manner.
|
||||||
|
|
||||||
The new GPT-4 Turbo with Vision model scores only 33% on aider's
|
**The new GPT-4 Turbo with Vision model scores only 33% on aider's
|
||||||
refactoring benchmark, making it the laziest coder of all the GPT-4 Turbo models
|
refactoring benchmark, making it the laziest coder of all the GPT-4 Turbo models
|
||||||
by a significant margin.
|
by a significant margin.**
|
||||||
|
|
||||||
# Conclusions
|
# Conclusions
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue