This commit is contained in:
Paul Gauthier 2024-06-05 16:25:11 -07:00
parent 9de32c88c3
commit c0700161bc
18 changed files with 68 additions and 4 deletions

View file

@ -70,10 +70,13 @@ So you can bounce back and forth between aider and your editor, to collaborative
## State of the art
Aider has the highest score on the challenging SWE Bench benchmark.
Aider has the highest score on the challenging
[SWE Bench benchmark](https://aider.chat/2024/06/02/main-swe-bench.html).
<p align="center">
<a href="https://aider.chat/2024/06/02/main-swe-bench.html">
<img src="https://aider.chat/assets/swe_bench.svg" alt="aider swe bench">
</a>
</p>

View file

@ -4,6 +4,10 @@ excerpt: Claude 3 Opus outperforms all of OpenAI's models on Aider's code editin
highlight_image: /assets/2024-03-07-claude-3.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Claude 3 beats GPT-4 on Aider's code editing benchmark
[![benchmark results](/assets/2024-03-07-claude-3.svg)](https://aider.chat/assets/2024-03-07-claude-3.svg)

View file

@ -4,6 +4,10 @@ excerpt: OpenAI's GPT-4 Turbo with Vision model scores worse on aider's code edi
highlight_image: /assets/2024-04-09-gpt-4-turbo-laziness.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# GPT-4 Turbo with Vision is a step backwards for coding
[OpenAI just released GPT-4 Turbo with Vision](https://twitter.com/OpenAIDevs/status/1777769463258988634)

View file

@ -4,6 +4,10 @@ excerpt: Aider has an experimental browser UI, allowing you to collaborate with
highlight_image: /assets/browser.jpg
nav_order: 800
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Aider in your browser
<div class="video-container">

View file

@ -4,6 +4,9 @@ excerpt: Use GPT-4o to draw graphs with matplotlib, including adjusting styles a
highlight_image: /assets/models-over-time.png
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
![LLM coding skill over time](/assets/models-over-time.svg)

View file

@ -5,6 +5,9 @@ highlight_image: /assets/linting.jpg
draft: true
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# A draft post

View file

@ -4,6 +4,9 @@ excerpt: Aider now lints code after every LLM edit and automatically fixes error
highlight_image: /assets/linting.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
[![Linting code](/assets/linting.jpg)](https://aider.chat/assets/linting.jpg)

View file

@ -4,6 +4,9 @@ excerpt: Aider achieved this result mainly through its existing features that fo
highlight_image: /assets/swe_bench_lite.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# How aider scored SOTA 26.3% on SWE Bench Lite

View file

@ -4,6 +4,9 @@ excerpt: Aider has written 7% of its own code, via 600+ commits that inserted 4.
highlight_image: /assets/self-assembly.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Aider has written 7% of its own code

View file

@ -4,6 +4,9 @@ excerpt: Aider sets SOTA for the main SWE Bench, after recently setting SOTA for
highlight_image: /assets/swe_bench.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Aider is SOTA for both SWE Bench and SWE Bench Lite

View file

@ -4,6 +4,10 @@ excerpt: The new `gpt-4-0125-preview` model is quantiatively lazier at coding th
highlight_image: /assets/benchmarks-0125.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# The January GPT-4 Turbo is lazier than the last version
[![benchmark results](/assets/benchmarks-0125.svg)](https://aider.chat/assets/benchmarks-0125.svg)

View file

@ -4,6 +4,10 @@ excerpt: A quantitative comparison of the code editing capabilities of the new G
highlight_image: /assets/benchmarks-1106.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Code editing benchmarks for OpenAI's "1106" models
[![benchmark results](/assets/benchmarks-1106.svg)](https://aider.chat/assets/benchmarks-1106.svg)

View file

@ -5,6 +5,10 @@ canonical_url: https://aider.chat/2023/11/06/benchmarks-speed-1106.html
highlight_image: /assets/benchmarks-speed-1106.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106
<p class="post-date">{{ page.date | date: "%b %-d, %Y" }}</p>

View file

@ -4,6 +4,10 @@ excerpt: Benchmarking GPT-3.5 and GPT-4 code editing skill using a new code edit
highlight_image: /assets/benchmarks.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# GPT code editing benchmarks
[![benchmark results](/assets/benchmarks.svg)](https://aider.chat/assets/benchmarks.svg)

View file

@ -4,6 +4,10 @@ excerpt: Using ctags to build a "repository map" to increase GPT-4's ability to
highlight_image: /assets/robot-flowchart.png
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Improving GPT-4's codebase understanding with ctags
![robot flowchat](/assets/robot-flowchart.png)

View file

@ -4,6 +4,10 @@ excerpt: Tree-sitter allows aider to build a repo map that better summarizes lar
highlight_image: /assets/robot-ast.png
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Building a better repository map with tree sitter
![robot flowchat](/assets/robot-ast.png)

View file

@ -4,6 +4,10 @@ excerpt: GPT-4 Turbo has a problem with lazy coding, which can be signiciantly i
highlight_image: /assets/benchmarks-udiff.jpg
nav_exclude: true
---
{% if page.date %}
<p class="post-date">{{ page.date | date: "%B %d, %Y" }}</p>
{% endif %}
# Unified diffs make GPT-4 Turbo 3X less lazy
![robot flowchart](/assets/benchmarks-udiff.svg)

View file

@ -74,10 +74,13 @@ So you can bounce back and forth between aider and your editor, to collaborative
## State of the art
Aider has the highest score on the challenging SWE Bench benchmark.
Aider has the highest score on the challenging
[SWE Bench benchmark](https://aider.chat/2024/06/02/main-swe-bench.html).
<p align="center">
<a href="https://aider.chat/2024/06/02/main-swe-bench.html">
<img src="https://aider.chat/assets/swe_bench.svg" alt="aider swe bench">
</a>
</p>