copy

2025-05-31 17:55:01 +00:00 · 2023-12-17 18:38:52 -08:00 · 2023-12-17 18:38:52 -08:00 · 5c5025e6cf
commit 5c5025e6cf
parent ed6d30c849
1 changed files with 21 additions and 22 deletions
--- a/docs/unified-diffs.md
+++ b/docs/unified-diffs.md
@ -6,8 +6,8 @@

 Aider now asks GPT-4 Turbo to use
 [unified diffs](https://www.gnu.org/software/diffutils/manual/html_node/Example-Unified.html)
-to edit your code when you request new features, improvements, bug fixes, test cases, etc.
-Using unified diffs massively reduces GPT-4 Turbo's bad habit of "lazy" coding,
+to edit your code.
+This massively reduces GPT-4 Turbo's bad habit of "lazy" coding,
 where it writes half completed code filled with comments
 like "...add logic here...".

@ -25,29 +25,31 @@ This new laziness benchmark produced the following results with `gpt-4-1106-prev

 - **GPT-4 Turbo only scored 15% as a baseline** using aider's existing "SEARCH/REPLACE block" edit format.
 - **Aider's new unified diff edit format raised the score to 65%**.
- **No benefit from the user being blind, without hands, tipping $2000 or fearing truncated code trauma.** These widely circulated folk remedies performed no better than baseline when added to the system prompt with aider's SEARCH/REPLACE edit format. Including *all* of them only scored at 15%
+- **No benefit from the user being blind, without hands, tipping $2000 or fearing truncated code trauma.** These widely circulated folk remedies performed no better than baseline when added to the system prompt with aider's SEARCH/REPLACE edit format. Including *all* of them still only scored at 15%

-The older `gpt-4-0613` also did better on the laziness benchmark using unified diffs.
-The benchmark was designed to work with large source code files, and
-28% of them are too large to fit in June GPT-4's 8k context window.
-This significantly harmed the benchmark results.
+The older `gpt-4-0613` also did better on the laziness benchmark using unified diffs:

 - **The June GPT-4's baseline was 26%** using aider's existing "SEARCH/REPLACE block" edit format.
 - **Aider's new unified diff edit format raised June GPT-4's score to 59%**. 
+- The benchmark was designed to use large files, and
+28% of them are too large to fit in June GPT-4's 8k context window.
+This significantly harmed the benchmark results.

 Before settling on unified diffs,
-I explored many other approaches.
-These efforts included prompts about being tireless and diligent,
-use of OpenAI's function/tool calling capabilities and numerous variations on
-aider's existing editing formats, line number formats and other diff-like formats.
+I explored many other approaches including:
+prompts about being tireless and diligent,
+OpenAI's function/tool calling capabilities,
+numerous variations on aider's existing editing formats,
+line number based formats
+and other diff-like formats.
 The results shared here reflect
-an extensive investigation and a large number of benchmark evaluations of many approaches.
+an extensive investigation and benchmark evaluations of many approaches.

-The result is aider's new support for a unified diff editing format,
-which outperforms other solutions by a wide margin.
+Aider's new unified diff editing format
+outperforms other solutions by a wide margin.
 The rest of this article will describe
 aider's new editing format and refactoring benchmark.
-We will discuss some key design decisions,
+It will highlight some key design decisions,
 and evaluate their significance using ablation experiments.


@ -148,7 +150,7 @@ numbers in editing formats,
 backed up by many quantitative benchmark experiments.

 You've probably ignored the line numbers in every diff you've seen?
-So aider tells GPT not to include them,
+So aider tells GPT not to even include them,
 and just interprets each hunk from the unified diffs
 as a search and replace operation:

@ -163,8 +165,8 @@ This diff:
     return
 ```

-Means we want to search the file for all the
-*space* ` ` and *minus* `-` lines from the hunk:
+Means we need to search the file for the
+*space* ` ` and *minus* `-` lines:

 ```python
 def main(args):
@ -173,7 +175,7 @@ def main(args):
    return
 ```

-And then replace them with all the *space* ` ` and *plus* `+` lines:
+And replace them with the *space* ` ` and *plus* `+` lines:

 ```python
 def main(args):
@ -195,7 +197,6 @@ Consider this slightly more complex change, which renames the variable `n` to
@@ ... @@
 -def factorial(n):
 +def factorial(number):
-     "compute factorial"
 -    if n == 0:
 +    if number == 0:
         return 1
@ -212,13 +213,11 @@ but it is much easier to see two different coherent versions of the
 ```diff
@@ ... @@
 -def factorial(n):
-    "compute factorial"
 -    if n == 0:
 -        return 1
 -    else:
 -        return n * factorial(n-1)
 +def factorial(number):
-+    "compute factorial"
 +    if number == 0:
 +        return 1
 +    else: