diff --git a/docs/benchmarks.md b/docs/benchmarks.md
index 0f34d2971..1f80bf7d9 100644
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
@@ -46,9 +46,9 @@ The results were quite interesting:
 The quantitative benchmark results align with my intuitions
 about prompting GPT for complex tasks like coding. It's beneficial to
 minimize the "cognitive overhead" of formatting the response, allowing
-GPT to concentrate on the task at hand. As an analogy, asking a junior
+GPT to concentrate on the task at hand. As an analogy, imagine asking a junior
 developer to implement a new feature by manually typing the required
-code changes as `diff -c` formatted edits wouldn't generate a good result.
+code changes as `diff -c` formatted edits. You wouldn't expect a good result.
 
 Using more complex output formats seems to introduce two issues: