Commit graph

2042 commits

Author SHA1 Message Date
Paul Gauthier
271807123a Handle missing response text 2023-06-29 15:49:41 -07:00
Paul Gauthier
3bbfebc390 fixed console.print 2023-06-29 15:34:10 -07:00
Paul Gauthier
b11a6fa7ee track num_exhausted_context_windows 2023-06-29 15:32:45 -07:00
Paul Gauthier
995fdc3415 accept lists even when we asked for strings 2023-06-29 15:17:52 -07:00
Paul Gauthier
1571f34fa3 added "unique lines" to the prompt 2023-06-29 15:16:33 -07:00
Paul Gauthier
87dc262214 roughed in diff-func-string 2023-06-29 15:11:59 -07:00
Paul Gauthier
58e763cee7 roughed in diff-func-string 2023-06-29 15:10:33 -07:00
Paul Gauthier
92c2d47e20 added docker warning 2023-06-29 14:59:38 -07:00
Paul Gauthier
5fddb9c90a removed remaining 2023-06-29 14:32:00 -07:00
Paul Gauthier
1e40dca4f1 added --cont 2023-06-29 14:22:24 -07:00
Paul Gauthier
5ee455a05b docker environment for the whole benchmark 2023-06-29 14:19:00 -07:00
Paul Gauthier
0ab19aa97c aider: Add script to run a Docker container with Python 3.8 and launch bash.
# Context:
USER: run a docker with python3.8
ASSISTANT: To run a Docker container with Python 3.8, you
^C KeyboardInterrupt
USER: run a docker with python3.8 and launch bash
ASSISTANT: To run a Docker container with Python 3.8 and launch bash, you need to modify the `docker.sh` script in the `benchmark` directory.

Here is the change you need to make:

```bash
benchmark/docker.sh
<<<<<<< ORIGINAL

=======
#!/bin/bash
docker run -it --rm python:3.8 bash
>>>>>>> UPDATED
```

This script will start a Docker container using the Python 3.8 image and launch bash. You can run this script from the terminal with the command `./benchmark/docker.sh`. Make sure the script has execute permissions. You can add them with the command `chmod +x benchmark/docker.sh` if needed.
2023-06-29 11:57:36 -07:00
Paul Gauthier
c0a625d934 Added new files to the git repo: benchmark/docker.sh 2023-06-29 11:57:02 -07:00
Paul Gauthier
5eb1b2bdec added timeout reporting 2023-06-29 11:55:28 -07:00
Paul Gauthier
b852c35828 added back timeout 2023-06-29 11:15:51 -07:00
Paul Gauthier
ec4dc74de4 subprocess to unittest 2023-06-29 11:01:57 -07:00
Paul Gauthier
1f4e08d20c removed timeouts 2023-06-29 10:42:09 -07:00
Paul Gauthier
13a87c7700 show test output 2023-06-29 10:33:53 -07:00
Paul Gauthier
24589b543e drive unittest directly 2023-06-29 10:26:33 -07:00
Paul Gauthier
81ec0e1608 refactor timeouts 2023-06-29 09:22:11 -07:00
Paul Gauthier
4f81f88c91 Added --no-aider, better error if docker is not available 2023-06-29 07:22:53 -07:00
Paul Gauthier
3c63552628 Revert "aider: Refactor run_unit_tests to kill the subprocess when it times out."
This reverts commit 6632367978.
2023-06-29 07:08:53 -07:00
Paul Gauthier
426baa85dc initial 2023-06-28 18:32:09 -07:00
Paul Gauthier
6632367978 aider: Refactor run_unit_tests to kill the subprocess when it times out.
# Context:
USER: when `subprocess` `timeout` fires, do we need to kill the subprocess? just in `run_unit_tests`
ASSISTANT: When a subprocess times out, it is not automatically terminated. So, yes, if you want to ensure that the subprocess is terminated when it times out, you should explicitly kill it.

Here's how you can modify the `run_unit_tests` function to kill the subprocess when it times out:

```python
benchmark/benchmark.py
<<<<<<< ORIGINAL
        try:
            result = subprocess.run(
                command,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
                timeout=timeout,
            )
            if result.returncode != 0:
                all_tests_passed = False
                print(f"Test {test_file} failed")

            res = cleanup_test_output(result.stdout)

        except subprocess.TimeoutExpired:
            all_tests_passed = False
            res = f"Test {test_file} timed out after {timeout} seconds."
=======
        with subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) as proc:
            try:
                stdout, _ = proc.communicate(timeout=timeout)
                if proc.returncode != 0:
                    all_tests_passed = False
                    print(f"Test {test_file} failed")
                res = cleanup_test_output(stdout)
            except subprocess.TimeoutExpired:
                proc.kill()
                proc.wait()
                all_tests_passed = False
                res = f"Test {test_file} timed out after {timeout} seconds."
>>>>>>> UPDATED
```

This change uses `subprocess.Popen` to create the subprocess and `communicate` to interact with it. If `communicate` times out, it raises `subprocess.TimeoutExpired`, and then we kill the subprocess with `proc.kill()`.
2023-06-28 18:29:58 -07:00
Paul Gauthier
d1f0590911 move benchmark prompts into their own file 2023-06-28 18:25:39 -07:00
Paul Gauthier
0845de7a7d ssh 2023-06-28 16:01:52 -07:00
Paul Gauthier
03a5598c60 Add simple sanity check for docker 2023-06-28 15:55:42 -07:00
Paul Gauthier
06656998e1 run tests with unittest, shrink output 2023-06-28 15:52:24 -07:00
Paul Gauthier
ea83df1ea3 remove long ==--- lines to save tokens 2023-06-28 15:44:23 -07:00
Paul Gauthier
51638d0989 aider: Added test for cleanup_test_output function in test_benchmark.py.
# Context:
USER: make a test for `cleanup_test_output`
ASSISTANT: To create a test for the `cleanup_test_output` function, we need to create a new test file in the `benchmark/tests` directory. Let's call it `test_benchmark_utils.py`.

In this new file, we will import the `cleanup_test_output` function from `benchmark.py` and write a test function using the `unittest` framework.

Here is the content for the new test file:

```python
benchmark/tests/test_benchmark_utils.py
<<<<<<< ORIGINAL

=======
import unittest
from benchmark.benchmark import cleanup_test_output

class TestCleanupTestOutput(unittest.TestCase):
    def test_cleanup_test_output(self):
        # Test case with timing info
        output = "Ran 5 tests in 0.003s\nOK"
        expected = "\nOK"
        self.assertEqual(cleanup_test_output(output), expected)

        # Test case without timing info
        output = "OK"
        expected = "OK"
        self.assertEqual
^C KeyboardInterrupt
USER: no in test_benchmark.py
ASSISTANT: To add a test for the `cleanup_test_output` function in the `test_benchmark.py` file, we need to import the `cleanup_test_output` function from `benchmark.py` and write a test function using the `unittest` framework.

Here is the content for the updated test file:

```python
benchmark/tests/test_benchmark.py
<<<<<<< ORIGINAL

=======
import unittest
from benchmark.benchmark import cleanup_test_output

class TestCleanupTestOutput(unittest.TestCase):
    def test_cleanup_test_output(self):
        # Test case with timing info
        output = "Ran 5 tests in 0.003s\nOK"
        expected = "\nOK"
        self.assertEqual(cleanup_test_output(output), expected)

        # Test case without timing info
        output = "OK"
        expected = "OK"
        self.assertEqual(cleanup_test_output(output), expected)
>>>>>>> UPDATED
```

Please note that you need to add the above code to the existing `test_benchmark.py` file. The `ORIGINAL` section is empty because it's a new addition to the file.
2023-06-28 15:37:41 -07:00
Paul Gauthier
ac72e4c426 Added new files to the git repo: benchmark/tests/test_benchmark.py 2023-06-28 15:36:47 -07:00
Paul Gauthier
a0844839ca wip 2023-06-28 15:36:34 -07:00
Paul Gauthier
dbdccc42a2 prompt 2023-06-28 15:29:59 -07:00
Paul Gauthier
6c87d5be1c better 2023-06-28 13:52:43 -07:00
Paul Gauthier
2691c80ac2 aider: Change original_lines and updated_lines properties in EditBlockFunctionCoder class to be an array of strings instead of a single string.
# Context:
USER: change the `functions` jsonschema so that `original_lines` is a list of strings and so is `update_files`
ASSISTANT: To make the requested changes, we need to modify the `functions` JSON schema in the `EditBlockFunctionCoder` class in the `aider/coders/editblock_func_coder.py` file.

Here is the change:

```python
aider/coders/editblock_func_coder.py
<<<<<<< ORIGINAL
                            properties=dict(
                                path=dict(
                                    type="string",
                                    description="Path of file to edit",
                                ),
                                original_lines=dict(
                                    type="string",
                                    description=(
                                        "Lines from the original file, including all"
                                        " whitespace, newlines, without skipping any lines"
                                    ),
                                ),
                                updated_lines=dict(
                                    type="string",
                                    description="New content to replace the `original_lines` with",
                                ),
                            ),
=======
                            properties=dict(
                                path=dict(
                                    type="string",
                                    description="Path of file to edit",
                                ),
                                original_lines=dict(
                                    type="array",
                                    items=dict(
                                        type="string",
                                    ),
                                    description=(
                                        "Lines from the original file, including all"
                                        " whitespace, newlines, without skipping any lines"
                                    ),
                                ),
                                updated_lines=dict(
                                    type="array",
                                    items=dict(
                                        type="string",
                                    ),
                                    description="New content to replace the `original_lines` with",
                                ),
                            ),
>>>>>>> UPDATED
```

This change modifies the `original_lines` and `updated_lines` properties to be an array of strings instead of a single string. This allows for multiple lines to be specified in the `original_lines` and `updated_lines` fields.
2023-06-28 11:03:59 -07:00
Paul Gauthier
67114d1a80 Fail if docker build fails 2023-06-28 11:02:47 -07:00
Paul Gauthier
67c27fd9ef Changed --stats-only -> --stats 2023-06-28 10:56:37 -07:00
Paul Gauthier
777f179c6d better 2023-06-28 10:54:58 -07:00
Paul Gauthier
1ac366aa62 works 2023-06-27 19:38:14 -07:00
Paul Gauthier
f22ccf6195 fixed json schema 2023-06-27 19:07:45 -07:00
Paul Gauthier
2358b638b8 wip 2023-06-27 18:59:15 -07:00
Paul Gauthier
582143d563 wip 2023-06-27 18:34:52 -07:00
Paul Gauthier
e24d03060d rename --retries to --tries 2023-06-27 17:45:57 -07:00
Paul Gauthier
11dddad4c1 zip the hashes 2023-06-27 17:42:52 -07:00
Paul Gauthier
10049d8695 remove timing info, to avoid randomizing the response to GPT 2023-06-27 17:15:05 -07:00
Paul Gauthier
9cfebc0b60 record response hashes too 2023-06-27 16:11:05 -07:00
Paul Gauthier
974d96973c move the empty list into init 2023-06-27 15:00:49 -07:00
Paul Gauthier
64e16c8ad1 aider: Added SHA1 hash of kwargs to chat_completion_call_hashes in send_with_retries method.
# Context:
USER: when calling `ChatCompletion`, take a sha1 hash of dict with all the args and append it to a list self.chat_completion_call_hashes
ASSISTANT: To implement this change, we need to modify the `send_with_retries` method in the `Coder` class in the `aider/coders/base_coder.py` file. We will import the `hashlib` module to generate the SHA1 hash. We will also add a new attribute `chat_completion_call_hashes` to the `Coder` class to store the hashes.

Here are the changes:

```python
aider/coders/base_coder.py
<<<<<<< ORIGINAL
import json
import os
import sys
import traceback
from json.decoder import JSONDecodeError
from pathlib import Path
=======
import hashlib
import json
import os
import sys
import traceback
from json.decoder import JSONDecodeError
from pathlib import Path
>>>>>>> UPDATED
```

```python
aider/coders/base_coder.py
<<<<<<< ORIGINAL
class Coder:
    abs_fnames = None
    repo = None
    last_aider_commit_hash = None
    last_asked_for_commit_time = 0
    repo_map = None
    functions = None
    total_cost = 0.0
=======
class Coder:
    abs_fnames = None
    repo = None
    last_aider_commit_hash = None
    last_asked_for_commit_time = 0
    repo_map = None
    functions = None
    total_cost = 0.0
    chat_completion_call_hashes = []
>>>>>>> UPDATED
```

```python
aider/coders/base_coder.py
<<<<<<< ORIGINAL
    def send_with_retries(self, model, messages, functions):
        kwargs = dict(
            model=model,
            messages=messages,
            temperature=0,
            stream=self.stream,
        )
        if functions is not None:
            kwargs["functions"] = self.functions

        res = openai.ChatCompletion.create(**kwargs)
        return res
=======
    def send_with_retries(self, model, messages, functions):
        kwargs = dict(
            model=model,
            messages=messages,
            temperature=0,
            stream=self.stream,
        )
        if functions is not None:
            kwargs["functions"] = self.functions

        # Generate SHA1 hash of kwargs and append it to chat_completion_call_hashes
        hash_object = hashlib.sha1(json.dumps(kwargs, sort_keys=True).encode())
        self.chat_completion_call_hashes.append(hash_object.hexdigest())

        res = openai.ChatCompletion.create(**kwargs)
        return res
>>>>>>> UPDATED
```
2023-06-27 14:59:58 -07:00
Paul Gauthier
8999c3fd97 move the progress bar to the bottom 2023-06-27 14:58:15 -07:00
Paul Gauthier
9556313819 Added functionality to use pre-existing benchmark results if they exist. 2023-06-27 14:27:07 -07:00