copy

2025-06-01 18:25:00 +00:00 · 2023-12-17 18:38:52 -08:00 · 2023-12-17 18:38:52 -08:00 · 5c5025e6cf
commit 5c5025e6cf
parent ed6d30c849
1 changed files with 21 additions and 22 deletions
--- a/docs/unified-diffs.md
+++ b/docs/unified-diffs.md
@ -6,8 +6,8 @@
 Aider now asks GPT-4 Turbo to use
 [unified diffs](https://www.gnu.org/software/diffutils/manual/html_node/Example-Unified.html)
-to edit your code when you request new features, improvements, bug fixes, test cases, etc.
+to edit your code.
-Using unified diffs massively reduces GPT-4 Turbo's bad habit of "lazy" coding,
+This massively reduces GPT-4 Turbo's bad habit of "lazy" coding,
 where it writes half completed code filled with comments
 like "...add logic here...".
@ -25,29 +25,31 @@ This new laziness benchmark produced the following results with `gpt-4-1106-prev
 - **GPT-4 Turbo only scored 15% as a baseline** using aider's existing "SEARCH/REPLACE block" edit format.
 - **Aider's new unified diff edit format raised the score to 65%**.
- **No benefit from the user being blind, without hands, tipping $2000 or fearing truncated code trauma.** These widely circulated folk remedies performed no better than baseline when added to the system prompt with aider's SEARCH/REPLACE edit format. Including *all* of them only scored at 15%
+- **No benefit from the user being blind, without hands, tipping $2000 or fearing truncated code trauma.** These widely circulated folk remedies performed no better than baseline when added to the system prompt with aider's SEARCH/REPLACE edit format. Including *all* of them still only scored at 15%
-The older `gpt-4-0613` also did better on the laziness benchmark using unified diffs.
+The older `gpt-4-0613` also did better on the laziness benchmark using unified diffs:
 The benchmark was designed to work with large source code files, and
 28% of them are too large to fit in June GPT-4's 8k context window.
 This significantly harmed the benchmark results.
 - **The June GPT-4's baseline was 26%** using aider's existing "SEARCH/REPLACE block" edit format.
 - **Aider's new unified diff edit format raised June GPT-4's score to 59%**. 
 - The benchmark was designed to use large files, and
 28% of them are too large to fit in June GPT-4's 8k context window.
 This significantly harmed the benchmark results.
 Before settling on unified diffs,
-I explored many other approaches.
+I explored many other approaches including:
-These efforts included prompts about being tireless and diligent,
+prompts about being tireless and diligent,
-use of OpenAI's function/tool calling capabilities and numerous variations on
+OpenAI's function/tool calling capabilities,
-aider's existing editing formats, line number formats and other diff-like formats.
+numerous variations on aider's existing editing formats,
 line number based formats
 and other diff-like formats.
 The results shared here reflect
-an extensive investigation and a large number of benchmark evaluations of many approaches.
+an extensive investigation and benchmark evaluations of many approaches.
-The result is aider's new support for a unified diff editing format,
+Aider's new unified diff editing format
-which outperforms other solutions by a wide margin.
+outperforms other solutions by a wide margin.
 The rest of this article will describe
 aider's new editing format and refactoring benchmark.
-We will discuss some key design decisions,
+It will highlight some key design decisions,
 and evaluate their significance using ablation experiments.
@ -148,7 +150,7 @@ numbers in editing formats,
 backed up by many quantitative benchmark experiments.
 You've probably ignored the line numbers in every diff you've seen?
-So aider tells GPT not to include them,
+So aider tells GPT not to even include them,
 and just interprets each hunk from the unified diffs
 as a search and replace operation:
@ -163,8 +165,8 @@ This diff:
     return
 ```
-Means we want to search the file for all the
+Means we need to search the file for the
-*space* ` ` and *minus* `-` lines from the hunk:
+*space* ` ` and *minus* `-` lines:
 ```python
 def main(args):
@ -173,7 +175,7 @@ def main(args):
    return
 ```
-And then replace them with all the *space* ` ` and *plus* `+` lines:
+And replace them with the *space* ` ` and *plus* `+` lines:
 ```python
 def main(args):
@ -195,7 +197,6 @@ Consider this slightly more complex change, which renames the variable `n` to
@@ ... @@
 -def factorial(n):
 +def factorial(number):
     "compute factorial"
 -    if n == 0:
 +    if number == 0:
         return 1
@ -212,13 +213,11 @@ but it is much easier to see two different coherent versions of the
 ```diff
@@ ... @@
 -def factorial(n):
 -    "compute factorial"
 -    if n == 0:
 -        return 1
 -    else:
 -        return n * factorial(n-1)
 +def factorial(number):
 +    "compute factorial"
 +    if number == 0:
 +        return 1
 +    else: