diff --git a/docs/benchmarks.md b/docs/benchmarks.md
index 4612bfda5..cda85cd86 100644
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
@@ -225,7 +225,7 @@ Benchmarking against 133 exercises provides some robustness all by itself, since
 we are measuring the performance across many exercises.
 
 But to get a sense of how much the API variance impacts the benchmark outcomes,
-I ran the all 133 exercises 10 times each
+I ran all 133 exercises 10 times each
 against `gpt-3.5-turbo-0613` with the `whole` edit format.
 You'll see one set of error bars in the graph, which demark
 the range of results across those 10 runs.