Update 2024-08-14-code-in-json.md

2025-05-31 17:55:01 +00:00 · 2024-08-14 18:56:01 -07:00 · 2024-08-14 18:56:01 -07:00 · a951a2afc9
commit a951a2afc9
parent b3ed2c8a48
1 changed files with 29 additions and 19 deletions
--- a/aider/website/_posts/2024-08-14-code-in-json.md
+++ b/aider/website/_posts/2024-08-14-code-in-json.md
@ -116,14 +116,13 @@ valid json, or even enforcing that it meets a specific schema.
 For example, OpenAI recently announced
 [strict enforcement of json responses]().

-The problem is that LLMs are bad a writing code when you ask them to wrap it
-into a json container.
-The json tooling around the LLM helps make sure it's valid json,
-which does solve an important problem. 
-LLMs used to frequently produce invalid json, so that's a big step forward.
-
-The problem remains, LLMs write worse code when they're asked to 
+But it's not sufficient to just produce 
+valid json, it also 
+has to contain quality code. 
+Unfortunately, 
+LLMs write worse code when they're asked to 
 emit it wrapped in json.
+
 In some sense this shouldn't be surprising.
 Just look at the very simple
 json example above, with the escaped 
@ -140,12 +139,17 @@ typing it into a text file or hand typing it as a properly escaped json string?

 Previous [benchmark results](/2023/07/02/benchmarks.html)
 showed
-the superiority of plain text coding compared to json-wrapped function calls,
-but they were done over a year ago.
+the superiority of returning code
+as  plain text coding compared to json-wrapped function calls.
+But those results were obtained
+over a year ago, against far less
+capable models. 
 OpenAI's newly announced support for "strict" json seemed like a good reason to
 investigate whether the newest models are still handicapped by json-wrapping code.

-To find out, I benchmarked 3 of the strongest code editing models:
+The graph above shows benchmark
+results from 
+3 of the strongest code editing models:

 - gpt-4o-2024-08-06
 - claude-3-5-sonnet-20240620
@ -155,15 +159,18 @@ Each model was given one try to solve
 [133 practice exercises from the Exercism python repository](/2023/07/02/benchmarks.html#the-benchmark).
 This is the standard aider "code editing" benchmark, except restricted to a single attempt.

-Each model ran through the benchmark with two strategies for returning code:
+Each model was assessed by the benchmark with two 
+different strategies for returning code:

- **Markdown** -- where the model simply returns the whole source code file in standard markdown triple-backtick fences.
+- **Markdown** -- where the model simply returned the whole source code file in standard markdown triple-backtick fences.
 - **Tool call** -- where the model is told to use a function to return the whole source code file. This requires the LLM to wrap the code in json.

-The markdown strategy would return a program like this:
+The markdown strategy is the same as
+aider's "whole" edit format. 
+It asks the LLM to return a program like this:

 ````
-Here is the program you asked for which prints "Hello, world!":
+Here is the program you asked for which prints "Hello":

 greeting.py
 ```
@ -177,18 +184,21 @@ two parameters, like this:

 ```
 {
-    "explanation": "Here is the program you asked for which prints \"Hello, world!\"",
+    "explanation": "Here is the program you asked for which prints \"Hello\"",
    "content": "def greeting():\n    print(\"Hello\")\n"
 }
 ```

-Both of these formats avoid actually *editing* source files, to keep things as
+Both of these formats avoid actually *editing* source files, to keep
+the task as
 simple as possible.
-This makes the task much easier, since the LLM can emit the whole source file intact.
-LLMs find it much more challenging to correctly formulate instructions to edit
+The LLM can emit the whole source file intact,
+which is much easier
+than correctly formulating
+instructions to edit
 portions of a file.

-We are simply testing the effects of json-wrapping on the LLMs ability to solve coding tasks.
+We are simply testing the effects of json-wrapping on the LLMs ability to write code to solve a task.

 ## Results