make all highlight_images jpg
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
title: Claude 3 beats GPT-4 on Aider's code editing benchmark
|
title: Claude 3 beats GPT-4 on Aider's code editing benchmark
|
||||||
excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI.
|
excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI.
|
||||||
highlight_image: /assets/2024-03-07-claude-3.svg
|
highlight_image: /assets/2024-03-07-claude-3.jpg
|
||||||
---
|
---
|
||||||
# Claude 3 beats GPT-4 on Aider's code editing benchmark
|
# Claude 3 beats GPT-4 on Aider's code editing benchmark
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
title: GPT-4 Turbo with Vision is a step backwards for coding
|
title: GPT-4 Turbo with Vision is a step backwards for coding
|
||||||
excerpt: OpenAI's GPT-4 Turbo with Vision model scores worse on aider's code editing benchmarks than all the previous GPT-4 models. In particular, it seems much more prone to "lazy coding" than the existing GPT-4 Turbo "preview" models.
|
excerpt: OpenAI's GPT-4 Turbo with Vision model scores worse on aider's code editing benchmarks than all the previous GPT-4 models. In particular, it seems much more prone to "lazy coding" than the existing GPT-4 Turbo "preview" models.
|
||||||
highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.svg
|
highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.jpg
|
||||||
---
|
---
|
||||||
# GPT-4 Turbo with Vision is a step backwards for coding
|
# GPT-4 Turbo with Vision is a step backwards for coding
|
||||||
|
|
||||||
|
|
BIN
assets/2024-03-07-claude-3.jpg
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
assets/2024-04-09-gpt-4-turbo-laziness.jpg
Normal file
After Width: | Height: | Size: 19 KiB |
BIN
assets/2024-04-09-gpt-4-turbo.jpg
Normal file
After Width: | Height: | Size: 25 KiB |
BIN
assets/benchmarks-0125.jpg
Normal file
After Width: | Height: | Size: 22 KiB |
BIN
assets/benchmarks-1106.jpg
Normal file
After Width: | Height: | Size: 31 KiB |
BIN
assets/benchmarks-speed-1106.jpg
Normal file
After Width: | Height: | Size: 25 KiB |
BIN
assets/benchmarks.jpg
Normal file
After Width: | Height: | Size: 33 KiB |
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
title: The January GPT-4 Turbo is lazier than the last version
|
title: The January GPT-4 Turbo is lazier than the last version
|
||||||
excerpt: The new `gpt-4-0125-preview` model is quantiatively lazier at coding than previous GPT-4 versions, according to a new "laziness" benchmark.
|
excerpt: The new `gpt-4-0125-preview` model is quantiatively lazier at coding than previous GPT-4 versions, according to a new "laziness" benchmark.
|
||||||
highlight_image: /assets/benchmarks-0125.svg
|
highlight_image: /assets/benchmarks-0125.jpg
|
||||||
---
|
---
|
||||||
# The January GPT-4 Turbo is lazier than the last version
|
# The January GPT-4 Turbo is lazier than the last version
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
title: Code editing benchmarks for OpenAI's "1106" models
|
title: Code editing benchmarks for OpenAI's "1106" models
|
||||||
excerpt: A quantitative comparison of the code editing capabilities of the new GPT-3.5 and GPT-4 versions that were released in Nov 2023.
|
excerpt: A quantitative comparison of the code editing capabilities of the new GPT-3.5 and GPT-4 versions that were released in Nov 2023.
|
||||||
highlight_image: /assets/benchmarks-1106.svg
|
highlight_image: /assets/benchmarks-1106.jpg
|
||||||
---
|
---
|
||||||
# Code editing benchmarks for OpenAI's "1106" models
|
# Code editing benchmarks for OpenAI's "1106" models
|
||||||
|
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
|
title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
|
||||||
excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
|
excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
|
||||||
canonical_url: https://aider.chat/2023/11/06/benchmarks-speed-1106.html
|
canonical_url: https://aider.chat/2023/11/06/benchmarks-speed-1106.html
|
||||||
highlight_image: /assets/benchmarks-speed-1106.svg
|
highlight_image: /assets/benchmarks-speed-1106.jpg
|
||||||
---
|
---
|
||||||
# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
|
# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
title: GPT code editing benchmarks
|
title: GPT code editing benchmarks
|
||||||
excerpt: Benchmarking GPT-3.5 and GPT-4 code editing skill using a new code editing benchmark suite based on the Exercism python exercises.
|
excerpt: Benchmarking GPT-3.5 and GPT-4 code editing skill using a new code editing benchmark suite based on the Exercism python exercises.
|
||||||
highlight_image: /assets/benchmarks.svg
|
highlight_image: /assets/benchmarks.jpg
|
||||||
---
|
---
|
||||||
# GPT code editing benchmarks
|
# GPT code editing benchmarks
|
||||||
|
|
||||||
|
|