mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-01 10:14:59 +00:00
fix: Update post on code in JSON
This commit is contained in:
parent
6ef2b8c0fa
commit
ed6ebfbdb6
1 changed files with 49 additions and 40 deletions
|
@ -114,22 +114,26 @@ document.addEventListener('DOMContentLoaded', function () {
|
||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
The newest LLMs have explicit tooling and
|
The newest LLMs have support for returning properly formatted json responses,
|
||||||
support for returning properly formatted json responses.
|
making it easy for client applications to parse complex responses.
|
||||||
While it is tempting to have LLMs use json tool or function calls to
|
This makes it tempting for AI coding applications to
|
||||||
return code or code edits, this is unwise.
|
use tool function calls or other structured reply formats to
|
||||||
|
receive code from LLMs.
|
||||||
|
Unfortunately,
|
||||||
LLMs write worse code when asked to wrap it in json, harming their ability
|
LLMs write worse code when asked to wrap it in json, harming their ability
|
||||||
to correctly solve coding tasks.
|
to correctly solve coding tasks.
|
||||||
Returning code as plain (markdown) text results in 6% higher scores
|
Returning code as plain (markdown) text results in an average of 6% higher scores
|
||||||
on a variant of the aider code editing benchmark.
|
on a variant of the aider code editing benchmark.
|
||||||
Even OpenAI's newest gpt-4o-2024-08-06 with "strict" json support
|
This holds true across many top coding LLMs,
|
||||||
|
and even OpenAI's newest gpt-4o-2024-08-06 with "strict" json support
|
||||||
suffers from this code-in-json handicap.
|
suffers from this code-in-json handicap.
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
A lot of people wonder why aider doesn't have LLMs use tools or function calls to
|
A lot of people wonder why aider doesn't tell LLMs to
|
||||||
|
use tools or function calls to
|
||||||
specify code edits.
|
specify code edits.
|
||||||
Instead, aider asks LLMs to return code edits in plain text, like this:
|
Instead, aider asks for code edits in plain text, like this:
|
||||||
|
|
||||||
````
|
````
|
||||||
greeting.py
|
greeting.py
|
||||||
|
@ -144,31 +148,30 @@ def greeting():
|
||||||
```
|
```
|
||||||
````
|
````
|
||||||
|
|
||||||
People expect that it would be easier and more reliable
|
People expect that it would be easier and more reliable to use tool calls,
|
||||||
for aider to parse a nicely formatted json
|
and parse a nicely formatted json
|
||||||
response, like this:
|
response:
|
||||||
|
|
||||||
```
|
```
|
||||||
{
|
{
|
||||||
"filename": "greeting.py",
|
"filename": "greeting.py",
|
||||||
"start_line": 6,
|
"search": "def greeting():\n print(\"Hello\")\n"
|
||||||
"end_line": 7,
|
"replace": "def greeting():\n print(\"Goodbye\")\n"
|
||||||
"new_content": "def greeting():\n print(\"Goodbye\")\n"
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
This seems even more tempting as LLMs
|
This has become even more tempting as LLM providers
|
||||||
get better tooling for reliably generating
|
continue to improve their tooling for reliably generating
|
||||||
valid json, or even enforcing that it meets a specific schema.
|
valid json.
|
||||||
For example, OpenAI recently announced
|
For example, OpenAI recently announced the ability to
|
||||||
[strict enforcement of json responses]().
|
[strictly enforce that json responses will be syntactically correct
|
||||||
|
and conform to a specified schema](https://openai.com/index/introducing-structured-outputs-in-the-api/).
|
||||||
|
|
||||||
But it's not sufficient to just produce
|
But producing valid (schema compliant) json is not sufficient for this use case.
|
||||||
valid json, it also
|
The json also has to contain valid, high quality code.
|
||||||
has to contain quality code.
|
And unfortunately,
|
||||||
Unfortunately,
|
|
||||||
LLMs write worse code when they're asked to
|
LLMs write worse code when they're asked to
|
||||||
emit it wrapped in json.
|
wrap it in json.
|
||||||
|
|
||||||
In some sense this shouldn't be surprising.
|
In some sense this shouldn't be surprising.
|
||||||
Just look at the very simple
|
Just look at the very simple
|
||||||
|
@ -176,24 +179,26 @@ json example above, with the escaped
|
||||||
quotes `\"` and
|
quotes `\"` and
|
||||||
newlines `\n`
|
newlines `\n`
|
||||||
mixed into the code.
|
mixed into the code.
|
||||||
Coding is complicated enough without having to escape all the special characters too.
|
Imagine if the code itself contained json or other quoted strings,
|
||||||
|
with their
|
||||||
|
own escape sequences.
|
||||||
|
|
||||||
If you tried to write a program,
|
If you tried to write a program,
|
||||||
would you do a better job
|
would you do a better job
|
||||||
typing it normally
|
typing it out normally
|
||||||
or as a properly escaped
|
or as a properly escaped
|
||||||
json string?
|
json string?
|
||||||
|
|
||||||
|
|
||||||
## Quantifying the benefits of plain text
|
## Quantifying the benefits of plain text
|
||||||
|
|
||||||
|
Previous [aider benchmark results](/2023/07/02/benchmarks.html)
|
||||||
Previous [benchmark results](/2023/07/02/benchmarks.html)
|
|
||||||
showed
|
showed
|
||||||
the superiority of returning code
|
the superiority of returning code
|
||||||
as plain text coding compared to json-wrapped function calls.
|
as plain text coding compared to json-wrapped function calls.
|
||||||
But those results were obtained
|
Those results were obtained
|
||||||
over a year ago, against far less
|
over a year ago, against far less
|
||||||
capable models.
|
capable models.
|
||||||
OpenAI's newly announced support for "strict" json seemed like a good reason to
|
OpenAI's newly announced support for "strict" json seemed like a good reason to
|
||||||
investigate whether the newest models are still handicapped by json-wrapping code.
|
investigate whether the newest models are still handicapped by json-wrapping code.
|
||||||
|
|
||||||
|
@ -207,17 +212,18 @@ results from
|
||||||
|
|
||||||
Each model was given one try to solve
|
Each model was given one try to solve
|
||||||
[133 practice exercises from the Exercism python repository](/2023/07/02/benchmarks.html#the-benchmark).
|
[133 practice exercises from the Exercism python repository](/2023/07/02/benchmarks.html#the-benchmark).
|
||||||
This is the standard aider "code editing" benchmark, except restricted to a single attempt.
|
This is the standard aider "code editing" benchmark, but restricted to a single attempt
|
||||||
|
without a second try to "fix" any errors.
|
||||||
|
|
||||||
Each model was assessed by the benchmark with two
|
Each model was assessed by the benchmark using two
|
||||||
different strategies for returning code:
|
different strategies for returning code:
|
||||||
|
|
||||||
- **Markdown** -- where the model simply returned the whole source code file in standard markdown triple-backtick fences.
|
- **Markdown** -- the model returned the whole source code file in standard markdown triple-backtick fences.
|
||||||
- **Tool call** -- where the model is told to use a function to return the whole source code file. This requires the LLM to wrap the code in json.
|
- **Tool call** -- the model used a tool function call to return the whole source code file. This requires the LLM to wrap the code in json.
|
||||||
|
|
||||||
The markdown strategy is the same as
|
The markdown strategy is the same as
|
||||||
aider's "whole" edit format.
|
aider's "whole" edit format, where the
|
||||||
It asks the LLM to return a program like this:
|
LLM would return a source file like this:
|
||||||
|
|
||||||
````
|
````
|
||||||
Here is the program you asked for which prints "Hello":
|
Here is the program you asked for which prints "Hello":
|
||||||
|
@ -230,7 +236,9 @@ def greeting():
|
||||||
````
|
````
|
||||||
|
|
||||||
The tool strategy requires the LLM to call the `write_file` function with
|
The tool strategy requires the LLM to call the `write_file` function with
|
||||||
two parameters, like this:
|
two parameters, as shown below.
|
||||||
|
For maximum simplicity, the LLM didn't even have to specify the filename,
|
||||||
|
since the benchmark operates only on a single source file.
|
||||||
|
|
||||||
```
|
```
|
||||||
{
|
{
|
||||||
|
@ -242,13 +250,14 @@ two parameters, like this:
|
||||||
Both of these formats avoid actually *editing* source files, to keep
|
Both of these formats avoid actually *editing* source files, to keep
|
||||||
the task as
|
the task as
|
||||||
simple as possible.
|
simple as possible.
|
||||||
The LLM can emit the whole source file intact,
|
The LLM is able to emit the whole source file intact,
|
||||||
which is much easier
|
which is much easier
|
||||||
than correctly formulating
|
than correctly formulating
|
||||||
instructions to edit
|
instructions to edit
|
||||||
portions of a file.
|
portions of a file.
|
||||||
|
|
||||||
We are simply testing the effects of json-wrapping on the LLMs ability to write code to solve a task.
|
This experimental setup is designed to highlight
|
||||||
|
the effects of json-wrapping on the LLMs ability to write code to solve a task.
|
||||||
|
|
||||||
## Results
|
## Results
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue