Commit graph

224 commits

Author SHA1 Message Date
Paul Gauthier
4f81f88c91 Added --no-aider, better error if docker is not available 2023-06-29 07:22:53 -07:00
Paul Gauthier
3c63552628 Revert "aider: Refactor run_unit_tests to kill the subprocess when it times out."
This reverts commit 6632367978.
2023-06-29 07:08:53 -07:00
Paul Gauthier
6632367978 aider: Refactor run_unit_tests to kill the subprocess when it times out.
# Context:
USER: when `subprocess` `timeout` fires, do we need to kill the subprocess? just in `run_unit_tests`
ASSISTANT: When a subprocess times out, it is not automatically terminated. So, yes, if you want to ensure that the subprocess is terminated when it times out, you should explicitly kill it.

Here's how you can modify the `run_unit_tests` function to kill the subprocess when it times out:

```python
benchmark/benchmark.py
<<<<<<< ORIGINAL
        try:
            result = subprocess.run(
                command,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
                timeout=timeout,
            )
            if result.returncode != 0:
                all_tests_passed = False
                print(f"Test {test_file} failed")

            res = cleanup_test_output(result.stdout)

        except subprocess.TimeoutExpired:
            all_tests_passed = False
            res = f"Test {test_file} timed out after {timeout} seconds."
=======
        with subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) as proc:
            try:
                stdout, _ = proc.communicate(timeout=timeout)
                if proc.returncode != 0:
                    all_tests_passed = False
                    print(f"Test {test_file} failed")
                res = cleanup_test_output(stdout)
            except subprocess.TimeoutExpired:
                proc.kill()
                proc.wait()
                all_tests_passed = False
                res = f"Test {test_file} timed out after {timeout} seconds."
>>>>>>> UPDATED
```

This change uses `subprocess.Popen` to create the subprocess and `communicate` to interact with it. If `communicate` times out, it raises `subprocess.TimeoutExpired`, and then we kill the subprocess with `proc.kill()`.
2023-06-28 18:29:58 -07:00
Paul Gauthier
d1f0590911 move benchmark prompts into their own file 2023-06-28 18:25:39 -07:00
Paul Gauthier
0845de7a7d ssh 2023-06-28 16:01:52 -07:00
Paul Gauthier
03a5598c60 Add simple sanity check for docker 2023-06-28 15:55:42 -07:00
Paul Gauthier
06656998e1 run tests with unittest, shrink output 2023-06-28 15:52:24 -07:00
Paul Gauthier
ea83df1ea3 remove long ==--- lines to save tokens 2023-06-28 15:44:23 -07:00
Paul Gauthier
a0844839ca wip 2023-06-28 15:36:34 -07:00
Paul Gauthier
67114d1a80 Fail if docker build fails 2023-06-28 11:02:47 -07:00
Paul Gauthier
67c27fd9ef Changed --stats-only -> --stats 2023-06-28 10:56:37 -07:00
Paul Gauthier
e24d03060d rename --retries to --tries 2023-06-27 17:45:57 -07:00
Paul Gauthier
11dddad4c1 zip the hashes 2023-06-27 17:42:52 -07:00
Paul Gauthier
10049d8695 remove timing info, to avoid randomizing the response to GPT 2023-06-27 17:15:05 -07:00
Paul Gauthier
9cfebc0b60 record response hashes too 2023-06-27 16:11:05 -07:00
Paul Gauthier
9556313819 Added functionality to use pre-existing benchmark results if they exist. 2023-06-27 14:27:07 -07:00
Paul Gauthier
70050ca004 improved logic for creating benchmark dirs 2023-06-27 08:02:37 -07:00
Paul Gauthier
29d60b9ddb check for dirname on --clean 2023-06-27 07:52:48 -07:00
Paul Gauthier
7736289309 test error prompt 2023-06-27 07:49:47 -07:00
Paul Gauthier
39434289f5 updated docker image name 2023-06-26 20:52:02 -07:00
Paul Gauthier
a109ab1890 aider: Added check for existing Docker image before building.
# Context:
USER: in `build_docker` check if the image is already built and return
ASSISTANT: To check if the Docker image is already built and return if it exists, we can use the `docker images` command and parse its output. Here's how to modify the `build_docker` function:

```python
benchmark/benchmark.py
<<<<<<< ORIGINAL
def build_docker():
    command = [
        "docker",
        "build",
        "--quiet",
        "-t",
        "benchmark",
        "-f",
        "benchmark/Dockerfile",
        "/dev/null",
    ]
    print(" ".join(command))

    try:
        result = subprocess.run(
            command,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )
        res = result.stdout
        print(res)

    except subprocess.CalledProcessError as e:
        res = f"Failed to build Docker image: {e.output}"

    return res
=======
def build_docker():
    image_name = "benchmark"
    check_command = ["docker", "images", "-q", image_name]
    check_result = subprocess.run(check_command, stdout=subprocess.PIPE, text=True)
    if check_result.stdout.strip():
        print(f"Docker image '{image_name}' already exists, skipping build.")
        return

    command = [
        "docker",
        "build",
        "--quiet",
        "-t",
        image_name,
        "-f",
        "benchmark/Dockerfile",
        "/dev/null",
    ]
    print(" ".join(command))

    try:
        result = subprocess.run(
            command,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )
        res = result.stdout
        print(res)

    except subprocess.CalledProcessError as e:
        res = f"Failed to build Docker image: {e.output}"

    return res
>>>>>>> UPDATED
```

This modification adds a check for the existence of the Docker image before attempting to build it. If the image already exists, the function prints a message and returns early.
2023-06-26 20:48:55 -07:00
Paul Gauthier
5e2681ec9d call build_docker 2023-06-26 20:47:49 -07:00
Paul Gauthier
898d98fdb3 aider: Added a build_docker() function that builds a docker image called benchmark from the benchmark/Dockerfile with no other fs context.
# Context:
USER: add a build_docker() func that builds a docker image called `benchmark` from the benchmark/Dockerfile with no other fs context
2023-06-26 20:41:38 -07:00
Paul Gauthier
095f47593d run unit tests inside docker for safety 2023-06-26 20:37:55 -07:00
Renamed from scripts/benchmark.py (Browse further)