diff --git a/docs/benchmarks.md b/docs/benchmarks.md
index 329147769..d217b90c4 100644
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
@@ -3,16 +3,14 @@
 
 ![benchmark results](../assets/benchmarks.svg)
 
-Aider is an open source command line chat tool that lets you ask GPT to edit
-code in your local git repos.
-You can use aider to ask GPT to add features, write tests or make other changes and
-improvements to your code.
+Aider is an open source command line chat tool that lets you work with GPT to edit
+code in your local git repo.
+You can use aider to have GPT add features, write tests or make other changes to your code.
 
 The ability for GPT to reliably edit local source files is
-crucial for this functionality.
-Much of this depends on the "edit format", which is an important component of the
-system prompt.
-The edit format specifies how GPT should structure code edits in its
+crucial for this functionality, and depends mainly on the "edit format".
+The edit format is an important component of the system prompt,
+which specifies how GPT should structure code edits in its
 responses.
 
 Aider currently uses simple text based editing formats, but
@@ -242,12 +240,17 @@ The benchmark results have me fairly convinced that the new
 `gpt-3.5-turbo-0613` and `gpt-3.5-16k-0613` models
 are a bit worse at code editing than
 the older `gpt-3.5-turbo-0301` model.
-This is especially visible in the "first coding attempt"
+
+This is visible in the "first coding attempt"
 portion of each result, before GPT gets a second chance to edit the code.
 Look at the horizontal white line in the middle of the first three blue bars.
-
 Performance with the `whole` edit format was 46% for the
 February model and only 39% for the June models.
+
+But also note how much the solid green `diff` bars
+degrade between the February and June GPT-3.5 models.
+They drop from 30% down to about 19%.
+
 I saw other signs of this degraded performance
 in earlier versions of the
 benchmark as well.