Merge branch 'main' into read-write-changes

This commit is contained in:
Paul Gauthier 2024-05-09 13:43:53 -07:00
commit d8e1628c18
16 changed files with 9 additions and 8 deletions

View file

@ -6,10 +6,11 @@
{% seo %} {% seo %}
{% if page.highlight_image %} {% if page.highlight_image %}
<meta property="og:image" content="{{ site.url }}{{ page.highlight_image }}"> <meta property="og:image" content="{{ site.url }}{{ page.highlight_image }}">
<meta property="twitter:image" content="{{ site.url }}{{ page.highlight_image }}">
{% else %} {% else %}
<meta property="og:image" content="{{ site.url }}/assets/aider.jpg"> <meta property="og:image" content="{{ site.url }}/assets/aider.jpg">
<meta property="twitter:image" content="{{ site.url }}/assets/aider-square.jpg">
{% endif %} {% endif %}
<meta property="twitter:image" content="{{ site.url }}/assets/aider-square.jpg">
<link rel="preconnect" href="https://fonts.gstatic.com"> <link rel="preconnect" href="https://fonts.gstatic.com">
<link rel="preload" href="https://fonts.googleapis.com/css?family=Open+Sans:400,700&display=swap" as="style" type="text/css" crossorigin> <link rel="preload" href="https://fonts.googleapis.com/css?family=Open+Sans:400,700&display=swap" as="style" type="text/css" crossorigin>
<meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="viewport" content="width=device-width, initial-scale=1">

View file

@ -1,7 +1,7 @@
--- ---
title: Claude 3 beats GPT-4 on Aider's code editing benchmark title: Claude 3 beats GPT-4 on Aider's code editing benchmark
excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI. excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editing benchmark, making it the best available model for pair programming with AI.
highlight_image: /assets/2024-03-07-claude-3.svg highlight_image: /assets/2024-03-07-claude-3.jpg
--- ---
# Claude 3 beats GPT-4 on Aider's code editing benchmark # Claude 3 beats GPT-4 on Aider's code editing benchmark

View file

@ -1,7 +1,7 @@
--- ---
title: GPT-4 Turbo with Vision is a step backwards for coding title: GPT-4 Turbo with Vision is a step backwards for coding
excerpt: OpenAI's GPT-4 Turbo with Vision model scores worse on aider's code editing benchmarks than all the previous GPT-4 models. In particular, it seems much more prone to "lazy coding" than the existing GPT-4 Turbo "preview" models. excerpt: OpenAI's GPT-4 Turbo with Vision model scores worse on aider's code editing benchmarks than all the previous GPT-4 models. In particular, it seems much more prone to "lazy coding" than the existing GPT-4 Turbo "preview" models.
highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.svg highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.jpg
--- ---
# GPT-4 Turbo with Vision is a step backwards for coding # GPT-4 Turbo with Vision is a step backwards for coding

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

BIN
assets/benchmarks-0125.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

BIN
assets/benchmarks-1106.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

BIN
assets/benchmarks-udiff.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

BIN
assets/benchmarks.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

View file

@ -1,7 +1,7 @@
--- ---
title: The January GPT-4 Turbo is lazier than the last version title: The January GPT-4 Turbo is lazier than the last version
excerpt: The new `gpt-4-0125-preview` model is quantiatively lazier at coding than previous GPT-4 versions, according to a new "laziness" benchmark. excerpt: The new `gpt-4-0125-preview` model is quantiatively lazier at coding than previous GPT-4 versions, according to a new "laziness" benchmark.
highlight_image: /assets/benchmarks-0125.svg highlight_image: /assets/benchmarks-0125.jpg
--- ---
# The January GPT-4 Turbo is lazier than the last version # The January GPT-4 Turbo is lazier than the last version

View file

@ -1,7 +1,7 @@
--- ---
title: Code editing benchmarks for OpenAI's "1106" models title: Code editing benchmarks for OpenAI's "1106" models
excerpt: A quantitative comparison of the code editing capabilities of the new GPT-3.5 and GPT-4 versions that were released in Nov 2023. excerpt: A quantitative comparison of the code editing capabilities of the new GPT-3.5 and GPT-4 versions that were released in Nov 2023.
highlight_image: /assets/benchmarks-1106.svg highlight_image: /assets/benchmarks-1106.jpg
--- ---
# Code editing benchmarks for OpenAI's "1106" models # Code editing benchmarks for OpenAI's "1106" models

View file

@ -2,7 +2,7 @@
title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106 title: Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite. excerpt: This report provides a detailed comparison of the speed of GPT-4 Turbo and gpt-3.5-turbo-1106 models based on the aider benchmarking suite.
canonical_url: https://aider.chat/2023/11/06/benchmarks-speed-1106.html canonical_url: https://aider.chat/2023/11/06/benchmarks-speed-1106.html
highlight_image: /assets/benchmarks-speed-1106.svg highlight_image: /assets/benchmarks-speed-1106.jpg
--- ---
# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106 # Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106

View file

@ -1,7 +1,7 @@
--- ---
title: GPT code editing benchmarks title: GPT code editing benchmarks
excerpt: Benchmarking GPT-3.5 and GPT-4 code editing skill using a new code editing benchmark suite based on the Exercism python exercises. excerpt: Benchmarking GPT-3.5 and GPT-4 code editing skill using a new code editing benchmark suite based on the Exercism python exercises.
highlight_image: /assets/benchmarks.svg highlight_image: /assets/benchmarks.jpg
--- ---
# GPT code editing benchmarks # GPT code editing benchmarks

View file

@ -1,7 +1,7 @@
--- ---
title: Unified diffs make GPT-4 Turbo 3X less lazy title: Unified diffs make GPT-4 Turbo 3X less lazy
excerpt: GPT-4 Turbo has a problem with lazy coding, which can be signiciantly improved by asking for code changes formatted as unified diffs. excerpt: GPT-4 Turbo has a problem with lazy coding, which can be signiciantly improved by asking for code changes formatted as unified diffs.
highlight_image: /assets/benchmarks-udiff.svg highlight_image: /assets/benchmarks-udiff.jpg
--- ---
# Unified diffs make GPT-4 Turbo 3X less lazy # Unified diffs make GPT-4 Turbo 3X less lazy