From e6123624d87cd436108ef6442deceb5a8c6e4c59 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Fri, 30 Jun 2023 13:44:58 -0700 Subject: [PATCH] copy --- docs/benchmarks.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/benchmarks.md b/docs/benchmarks.md index f76c7dd3c..2d4fd235f 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -70,7 +70,7 @@ The goal is to read the instructions, implement the provided functions/class ske and pass all the unit tests. The benchmark measures what percentage of the 133 exercises are completed successfully, with all the associated unit tests passing. -To run the test, aider sends GPT the Exercism instructions followed by: +To complete an exercise, aider sends GPT the Exercism instructions followed by: ``` Use the above instructions to modify the supplied files: {file_list} @@ -90,7 +90,7 @@ Fix the code in {file_list} to resolve the errors. GPT gets this second chance to fix the implementation because many of the unit tests check for specifics that are not -clearly called out in the instructions. +called out in the instructions. For example, many tests want to see [specific phrases in ValueErrors](https://github.com/exercism/python/blob/f6caa44faa8fb7d0de9a54ddb5c6183e027429c6/exercises/practice/queen-attack/queen_attack_test.py#L31) raised by @@ -190,7 +190,7 @@ format requests original/updated edits to be returned using the function call AP } ``` -## ChatGPT function calls +## GPT-3.5 hallucinates function calls? GPT-3.5 was very prone to ignoring the JSON Schema that specified valid functions, and would often return a completely invalid `function_call` fragment with `name="python"`.