From c2b1bc7e07475d597d21ffe3bc247a5ad8b9f9f8 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Fri, 30 Jun 2023 14:36:35 -0700 Subject: [PATCH] copy --- docs/benchmarks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/benchmarks.md b/docs/benchmarks.md index 4612bfda5..cda85cd86 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -225,7 +225,7 @@ Benchmarking against 133 exercises provides some robustness all by itself, since we are measuring the performance across many exercises. But to get a sense of how much the API variance impacts the benchmark outcomes, -I ran the all 133 exercises 10 times each +I ran all 133 exercises 10 times each against `gpt-3.5-turbo-0613` with the `whole` edit format. You'll see one set of error bars in the graph, which demark the range of results across those 10 runs.