fix: Update post on code in JSON

2025-06-01 10:14:59 +00:00 · 2024-08-15 08:08:56 -07:00 · 2024-08-15 08:08:56 -07:00 · ed6ebfbdb6
commit ed6ebfbdb6
parent 6ef2b8c0fa
1 changed files with 49 additions and 40 deletions
--- a/aider/website/_posts/2024-08-14-code-in-json.md
+++ b/aider/website/_posts/2024-08-14-code-in-json.md
@ -114,22 +114,26 @@ document.addEventListener('DOMContentLoaded', function () {
 ## Abstract
-The newest LLMs have explicit tooling and
+The newest LLMs have support for returning properly formatted json responses,
-support for returning properly formatted json responses.
+making it easy for client applications to parse complex responses.
-While it is tempting to have LLMs use json tool or function calls to
+This makes it tempting for AI coding applications to
-return code or code edits, this is unwise.
+use tool function calls or other structured reply formats to
 receive code from LLMs.
 Unfortunately, 
 LLMs write worse code when asked to wrap it in json, harming their ability
 to correctly solve coding tasks.
-Returning code as plain (markdown) text results in 6% higher scores
+Returning code as plain (markdown) text results in an average of 6% higher scores
 on a variant of the aider code editing benchmark.
-Even OpenAI's newest gpt-4o-2024-08-06 with "strict" json support
+This holds true across many top coding LLMs, 
 and even OpenAI's newest gpt-4o-2024-08-06 with "strict" json support
 suffers from this code-in-json handicap.
 ## Introduction
-A lot of people wonder why aider doesn't have LLMs use tools or function calls to
+A lot of people wonder why aider doesn't tell LLMs to
 use tools or function calls to
 specify code edits.
-Instead, aider asks LLMs to return code edits in plain text, like this:
+Instead, aider asks for code edits in plain text, like this:
 ````
 greeting.py
@ -144,31 +148,30 @@ def greeting():
 ```
 ````
-People expect that it would be easier and more reliable
+People expect that it would be easier and more reliable to use tool calls,
-for aider to parse a nicely formatted json 
+and parse a nicely formatted json 
-response, like this:
+response:
 ```
 {
    "filename": "greeting.py",
-    "start_line": 6,
+    "search": "def greeting():\n    print(\"Hello\")\n"
-    "end_line": 7,
+    "replace": "def greeting():\n    print(\"Goodbye\")\n"
    "new_content": "def greeting():\n    print(\"Goodbye\")\n"
 }
 ```
-This seems even more tempting as LLMs 
+This has become even more tempting as LLM providers
-get better tooling for reliably generating
+continue to improve their tooling for reliably generating
-valid json, or even enforcing that it meets a specific schema.
+valid json.
-For example, OpenAI recently announced
+For example, OpenAI recently announced the ability to
-[strict enforcement of json responses]().
+[strictly enforce that json responses will be syntactically correct 
 and conform to a specified schema](https://openai.com/index/introducing-structured-outputs-in-the-api/).
-But it's not sufficient to just produce 
+But producing valid (schema compliant) json is not sufficient for this use case.
-valid json, it also 
+The json also has to contain valid, high quality code.
-has to contain quality code. 
+And unfortunately, 
 Unfortunately, 
 LLMs write worse code when they're asked to 
-emit it wrapped in json.
+wrap it in json.
 In some sense this shouldn't be surprising.
 Just look at the very simple
@ -176,24 +179,26 @@ json example above, with the escaped
 quotes `\"` and
 newlines `\n`
 mixed into the code.
-Coding is complicated enough without having to escape all the special characters too.
+Imagine if the code itself contained json or other quoted strings,
 with their
 own escape sequences.
 If you tried to write a program, 
 would you do a better job
-typing it normally
+typing it out normally
 or as a properly escaped 
 json string?
 ## Quantifying the benefits of plain text
-
+Previous [aider benchmark results](/2023/07/02/benchmarks.html)
 Previous [benchmark results](/2023/07/02/benchmarks.html)
 showed
 the superiority of returning code
-as  plain text coding compared to json-wrapped function calls.
+as plain text coding compared to json-wrapped function calls.
-But those results were obtained
+Those results were obtained
 over a year ago, against far less
-capable models. 
+capable models.
 OpenAI's newly announced support for "strict" json seemed like a good reason to
 investigate whether the newest models are still handicapped by json-wrapping code.
@ -207,17 +212,18 @@ results from
 Each model was given one try to solve 
 [133 practice exercises from the Exercism python repository](/2023/07/02/benchmarks.html#the-benchmark).
-This is the standard aider "code editing" benchmark, except restricted to a single attempt.
+This is the standard aider "code editing" benchmark, but restricted to a single attempt
 without a second try to "fix" any errors.
-Each model was assessed by the benchmark with two 
+Each model was assessed by the benchmark using two 
 different strategies for returning code:
- **Markdown** -- where the model simply returned the whole source code file in standard markdown triple-backtick fences.
+- **Markdown** -- the model returned the whole source code file in standard markdown triple-backtick fences.
- **Tool call** -- where the model is told to use a function to return the whole source code file. This requires the LLM to wrap the code in json.
+- **Tool call** -- the model used a tool function call to return the whole source code file. This requires the LLM to wrap the code in json.
 The markdown strategy is the same as
-aider's "whole" edit format. 
+aider's "whole" edit format, where the
-It asks the LLM to return a program like this:
+LLM would return a source file like this:
 ````
 Here is the program you asked for which prints "Hello":
@ -230,7 +236,9 @@ def greeting():
 ````
 The tool strategy requires the LLM to call the `write_file` function with
-two parameters, like this:
+two parameters, as shown below.
 For maximum simplicity, the LLM didn't even have to specify the filename,
 since the benchmark operates only on a single source file.
 ```
 {
@ -242,13 +250,14 @@ two parameters, like this:
 Both of these formats avoid actually *editing* source files, to keep
 the task as
 simple as possible.
-The LLM can emit the whole source file intact,
+The LLM is able to emit the whole source file intact,
 which is much easier
 than correctly formulating
 instructions to edit
 portions of a file.
-We are simply testing the effects of json-wrapping on the LLMs ability to write code to solve a task.
+This experimental setup is designed to highlight
 the effects of json-wrapping on the LLMs ability to write code to solve a task.
 ## Results