From c8b95b486f861bc9a4a1457783ccc67f67a24fff Mon Sep 17 00:00:00 2001 From: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com> Date: Mon, 6 Nov 2023 19:50:23 -0800 Subject: [PATCH] Update benchmarks-1106.md --- docs/benchmarks-1106.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/benchmarks-1106.md b/docs/benchmarks-1106.md index 4ee5e8922..ecf00ee22 100644 --- a/docs/benchmarks-1106.md +++ b/docs/benchmarks-1106.md @@ -16,9 +16,9 @@ For example, whenever I change aider's prompting or the backend which drives LLM conversations, I run the benchmark to make sure these changes produce improvements (not regressions). -The benchmark asks GPT to complete the -[Exercism Python coding exercises](https://github.com/exercism/python). -Exercism provides a starting python file with stubs for the needed functions, +The benchmark asks GPT to complete +[133 Exercism Python coding exercises](https://github.com/exercism/python). +For each exercise, Exercism provides a starting python file with stubs for the needed functions, a natural language description of the problem to solve and a test suite to evaluate whether the coder has correctly solved the problem.