Commit graph

3185 commits

Author SHA1 Message Date
Paul Gauthier
ec95cd2d76 allow longer conversation histories for models with larger context windows 2023-11-27 10:55:36 -08:00
Paul Gauthier
4192e78bcd Merge remote-tracking branch 'origin/main' 2023-11-27 10:47:15 -08:00
paul-gauthier
390c2dcea0
Merge pull request #371 from zestysoft/fix-benchmark-docs 2023-11-25 13:19:30 -08:00
Ian Brown
049eb9966d
Use expected exercism-python destination
Signed-off-by: Ian Brown <ian@zestysoft.com>
2023-11-25 12:11:12 -08:00
Paul Gauthier
e771adbff5 Refactor model_ids to be sorted in check_model_availability function. 2023-11-21 14:08:17 -08:00
Paul Gauthier
3d65217bed set version to 0.18.1-dev 2023-11-17 15:51:31 -08:00
Paul Gauthier
93f45a6d6b version bump to 0.18.0 2023-11-17 15:51:07 -08:00
Paul Gauthier
ffaa1a03d6 aider: Add logic to compare commit dates of main and origin/main and print a message explaining which one is behind.
# Aider chat conversation:

USER: if main != origin/main, explain which one is behind

ASSISTANT: To implement this feature, we need to modify `scripts/versionbump.py`. We will add additional logic to the `check_main_branch_up_to_date` function to compare the commit dates of `main` and `origin/main` and print a message explaining which one is behind.

Here is the *SEARCH/REPLACE* block for the changes:

scripts/versionbump.py
```python
<<<<<<< SEARCH
        if local_main != origin_main:
            print("Error: The main branch is not up to date with origin/main. Please pull the latest changes.")
            sys.exit(1)
=======
        if local_main != origin_main:
            local_date = subprocess.run(["git", "show", "-s", "--format=%ci", "main"], capture_output=True, text=True).stdout.strip()
            origin_date = subprocess.run(["git", "show", "-s", "--format=%ci", "origin/main"], capture_output=True, text=True).stdout.strip()
            local_date = datetime.datetime.strptime(local_date, "%Y-%m-%d %H:%M:%S %z")
            origin_date = datetime.datetime.strptime(origin_date, "%Y-%m-%d %H:%M:%S %z")
            if local_date < origin_date:
                print("Error: The local main branch is behind origin/main. Please pull the latest changes.")
            elif local_date > origin_date:
                print("Error: The origin/main branch is behind the local main branch. Please push your changes.")
            else:
                print("Error: The main branch and origin/main have diverged.")
            sys.exit(1)
>>>>>>> REPLACE
```

Please note that we need to import the `datetime` module at the beginning of the file to use `datetime.datetime.strptime`. Here is the *SEARCH/REPLACE* block for adding the import:

scripts/versionbump.py
```python
<<<<<<< SEARCH
import argparse
import re
import subprocess
import sys
=======
import argparse
import datetime
import re
import subprocess
import sys
>>>>>>> REPLACE
```

These changes will compare the commit dates and print a message explaining which branch is behind if they are not in sync.
2023-11-17 15:50:37 -08:00
Paul Gauthier
21b80d0c06 aider: Added logging to print out the commit hashes for local_main and origin_main before the comparison to diagnose the issue with the script reporting that the main branch is not up to date with origin/main even though it is.
# Aider chat conversation:

USER: it's printing Error: The main branch is not up to date with origin/main. even though main is up to date with origin/main

ASSISTANT: To address the issue you're experiencing with the script reporting that the main branch is not up to date with `origin/main` even though it is, we should add additional logging to the script to help diagnose the problem. We can print out the commit hashes for `local_main` and `origin_main` before the comparison to see what the script is actually getting from the `git` commands.

Let's modify the `scripts/versionbump.py` file to add this logging.

Here is the *SEARCH/REPLACE* block:

scripts/versionbump.py
```python
<<<<<<< SEARCH
        local_main = subprocess.run(["git", "rev-parse", "main"], capture_output=True, text=True).stdout.strip()
        origin_main = subprocess.run(["git", "rev-parse", "origin/main"], capture_output=True, text=True).stdout.strip()
        if local_main != origin_main:
            print("Error: The main branch is not up to date with origin/main.")
=======
        local_main = subprocess.run(["git", "rev-parse", "main"], capture_output=True, text=True).stdout.strip()
        print(f"Local main commit hash: {local_main}")
        origin_main = subprocess.run(["git", "rev-parse", "origin/main"], capture_output=True, text=True).stdout.strip()
        print(f"Origin main commit hash: {origin_main}")
        if local_main != origin_main:
            print("Error: The main branch is not up to date with origin/main. Please pull the latest changes.")
>>>>>>> REPLACE
```

This change will print out the commit hashes for both `local_main` and `origin_main` before the script checks if they are the same. This should help you diagnose why the script thinks the branches are not in sync. If the hashes are indeed the same, but the script still reports an error, it could be an issue with the script's logic or the environment in which it's being run.
2023-11-17 15:48:59 -08:00
Paul Gauthier
3bd716e46d Updated HISTORY 2023-11-17 15:47:42 -08:00
Paul Gauthier
52fcdc4b09 aider: Add Git checks before version bumping.
# Aider chat conversation:

USER: before changing the repo, check that we are on the main branch, check that the working dir is not dirty, fetch the latest changes from origin and confirm that the main branch is up to date (don't merge it though). if any of these fail, exit with an error.

ASSISTANT: To implement these checks in the `scripts/versionbump.py` file, we will need to add the necessary Git operations before proceeding with the version bumping logic. Here are the steps we will follow:

1. Import the `sys` module to exit the script with an error if needed.
2. Add a function to check the current Git branch and ensure it is 'main'.
3. Add a function to check if the working directory is clean.
4. Add a function to fetch the latest changes from the origin and check if the 'main' branch is up to date.
5. Call these functions at the beginning of the `main()` function.

Here are the *SEARCH/REPLACE* blocks for the changes:

scripts/versionbump.py
```python
<<<<<<< SEARCH
import argparse
import re
import subprocess
=======
import argparse
import re
import subprocess
import sys
>>>>>>> REPLACE
```

scripts/versionbump.py
```python
<<<<<<< SEARCH
    args = parser.parse_args()
    dry_run = args.dry_run
=======
    # Function to check if we are on the main branch
    def check_branch():
        branch = subprocess.run(["git", "rev-parse", "--abbrev-ref", "HEAD"], capture_output=True, text=True).stdout.strip()
        if branch != "main":
            print("Error: Not on the main branch.")
            sys.exit(1)

    # Function to check if the working directory is clean
    def check_working_directory_clean():
        status = subprocess.run(["git", "status", "--porcelain"], capture_output=True, text=True).stdout
        if status:
            print("Error: Working directory is not clean.")
            sys.exit(1)

    # Function to fetch the latest changes and check if the main branch is up to date
    def check_main_branch_up_to_date():
        subprocess.run(["git", "fetch", "origin"], check=True)
        local_main = subprocess.run(["git", "rev-parse", "main"], capture_output=True, text=True).stdout.strip()
        origin_main = subprocess.run(["git", "rev-parse", "origin/main"], capture_output=True, text=True).stdout.strip()
        if local_main != origin_main:
            print("Error: The main branch is not up to date with origin/main.")
            sys.exit(1)

    args = parser.parse_args()
    dry_run = args.dry_run

    # Perform checks before proceeding
    check_branch()
    check_working_directory_clean()
    check_main_branch_up_to_date()
>>>>>>> REPLACE
```

These changes will ensure that the script checks for the current branch, the cleanliness of the working directory, and whether the 'main' branch is up to date with the origin before proceeding with the version bumping process. If any of these checks fail, the script will exit with an error message.
2023-11-17 15:41:43 -08:00
paul-gauthier
92f03b5f7d
Update README.md 2023-11-17 15:01:27 -08:00
Paul Gauthier
bf6cd8294e handle submodules (or other dirs) which are git tracked #336 2023-11-15 13:34:29 -08:00
Paul Gauthier
5d0e92296a number -> numbered 2023-11-15 13:06:58 -08:00
Paul Gauthier
d54f883698 do not include added files in the repo map #315 2023-11-15 13:06:22 -08:00
Paul Gauthier
95fe1be4c5 /add and /drop should work on untracked files in a repo (without committing them) 2023-11-15 13:06:22 -08:00
Paul Gauthier
f6aa09ca85 copy 2023-11-14 16:05:43 -08:00
Paul Gauthier
1d0bc3dcb6 Updated benchmark reports 2023-11-14 16:03:50 -08:00
Paul Gauthier
a15ac7ebb6 copy 2023-11-14 13:22:42 -08:00
Paul Gauthier
5dd7287ee3 fixed test for windows 2023-11-14 13:19:57 -08:00
Paul Gauthier
0f9053008a Improved prompting for both GPT-4 and GPT-4 Turbo:
- Uses 2-shot examples, which highlight the need to:
  - Use multiple SEARCH/REPLACE blocks to make changes spread across multiple locations in a file (rather one "big" S/R with ... elides ... of unchanging chunks of code).
  - Include comments, docstrings, etc in the S/R blocks.
- Updated list of explicit "rules" for constructing S/R blocks.

For `gpt-4-1106-preview` the new prompts reduced `num_errors` from 25-30 -> 12-14, indicating improved ability to generate proper SEARCH/REPLACE blocks. Benchmark results of 54%/65% are stable or perhaps slightly better than the previous prompts.

For `gpt-4-0613` the new prompts produce a major improvement in benchmark results. Performance leaps from 47%/64% up to 51%/71%. This now makes `aider + gpt-4-0613` better at coding than `aider + gpt-4-1106-preview` after the 2nd try. And competitive after the 1st try.

benchmarks/2023-11-14-19-34-51--preview-prompting-positivity:

test-cases: 133
model: gpt-4-1106-preview
edit_format: diff
commit_hash: 9cbe114
num_error_outputs: 14
num_user_asks: 0
num_exhausted_context_windows 0
test_timeouts: 2

53.4% correct after try 0
64.7% correct after try 1

duration: 23.6 sec/test-case
costs: $0.0495/test-case, $6.58 total, $6.58 projected

benchmarks/2023-11-14-19-55-31--preview-prompt-against-0613

test-cases: 133
model: gpt-4-0613
edit_format: diff
commit_hash: 9cbe114
num_error_outputs: 10
num_user_asks: 0
num_exhausted_context_windows 0
test_timeouts: 1

51.1% correct after try 0
70.7% correct after try 1

duration: 52.5 sec/test-case
costs: $0.1229/test-case, $16.34 total, $16.34 projected
2023-11-14 13:11:31 -08:00
Paul Gauthier
4aba307ebd defend against no tokenizer 2023-11-14 13:09:50 -08:00
Paul Gauthier
6c77c32d06 aider: Refactored test_check_gitignore to use a temporary git config file and set the GIT_CONFIG_GLOBAL environment variable during the test execution.
# Aider chat conversation:

USER: in `test_check_gitignore` can we have gitpython not use the global gitignore?

ASSISTANT: To ensure that GitPython does not use the global gitignore, we can set the `GIT_CONFIG_GLOBAL` environment variable to point to a non-existent file or an empty temporary file during the test execution. This will effectively ignore the global gitignore settings.

We will modify the `test_check_gitignore` method in the `tests/test_main.py` file to set the `GIT_CONFIG_GLOBAL` environment variable before the test runs and reset it after the test is completed.

Here are the *SEARCH/REPLACE* blocks:

tests/test_main.py
```python
<<<<<<< SEARCH
    def test_check_gitignore(self):
        make_repo()
        io = InputOutput(pretty=False, yes=True)
        cwd = Path.cwd()
        gitignore = cwd / ".gitignore"

        self.assertFalse(gitignore.exists())
        check_gitignore(cwd, io)
        self.assertTrue(gitignore.exists())

        self.assertEqual(".aider*", gitignore.read_text().splitlines()[0])

        gitignore.write_text("one\ntwo\n")
        check_gitignore(cwd, io)
        self.assertEqual("one\ntwo\n.aider*\n", gitignore.read_text())
=======
    def test_check_gitignore(self):
        with tempfile.NamedTemporaryFile() as temp_gitconfig:
            os.environ['GIT_CONFIG_GLOBAL'] = temp_gitconfig.name
            make_repo()
            io = InputOutput(pretty=False, yes=True)
            cwd = Path.cwd()
            gitignore = cwd / ".gitignore"

            self.assertFalse(gitignore.exists())
            check_gitignore(cwd, io)
            self.assertTrue(gitignore.exists())

            self.assertEqual(".aider*", gitignore.read_text().splitlines()[0])

            gitignore.write_text("one\ntwo\n")
            check_gitignore(cwd, io)
            self.assertEqual("one\ntwo\n.aider*\n", gitignore.read_text())
            del os.environ['GIT_CONFIG_GLOBAL']
>>>>>>> REPLACE
```

This change will create a temporary file to act as a dummy global git configuration file and set the `GIT_CONFIG_GLOBAL` environment variable to its path. After the test, it will delete the environment variable to avoid affecting other tests or operations.
2023-11-14 13:05:47 -08:00
Paul Gauthier
d2f924d6d1 fixed /tokens 2023-11-14 13:03:25 -08:00
Paul Gauthier
9cbe114a74 positivity 2023-11-14 11:34:36 -08:00
Paul Gauthier
253354da98 stronger elide prompt 2023-11-14 11:16:39 -08:00
Paul Gauthier
60ca8f8c0d properly render the main sys prompt 2023-11-14 11:07:35 -08:00
Paul Gauthier
bbc174a8fb just use the Rules as the reminder 2023-11-14 11:01:29 -08:00
Paul Gauthier
e0fea58276 Put back 2nd reminder 2023-11-14 10:46:16 -08:00
Paul Gauthier
26a340e21f move the single format reminder into the top sys prompt 2023-11-14 10:38:39 -08:00
Paul Gauthier
df236e4dec stronger prompt to include comments, docstrings 2023-11-14 10:30:02 -08:00
Paul Gauthier
3986da7210 just use a single S/R format system prompt 2023-11-14 10:25:08 -08:00
Paul Gauthier
c24a4a4392 refactor into Model.token_count() 2023-11-14 10:11:13 -08:00
Paul Gauthier
90f57664a6 aider: Refactor cmd_tokens to use main_model.token_count() instead of calling tokenizer directly.
# Aider chat conversation:

USER: in `cmd_tokens` make use of main_model.token_count() rather than calling tokenizer directly

ASSISTANT: To make this change, we need to modify `aider/commands.py`. Specifically, we will update the `cmd_tokens` method to use `self.coder.main_model.token_count()` instead of calling `self.tokenizer.encode()` directly.

Here is the *SEARCH/REPLACE* block for the change:

aider/commands.py
<source>python
<<<<<<< SEARCH
        tokens = len(self.tokenizer.encode(json.dumps(msgs)))
=======
        tokens = self.coder.main_model.token_count(json.dumps(msgs))
>>>>>>> REPLACE
</source>

aider/commands.py
<source>python
<<<<<<< SEARCH
            tokens = len(self.tokenizer.encode(msgs))
=======
            tokens = self.coder.main_model.token_count(msgs)
>>>>>>> REPLACE
</source>

aider/commands.py
<source>python
<<<<<<< SEARCH
                tokens = len(self.tokenizer.encode(repo_content))
=======
                tokens = self.coder.main_model.token_count(repo_content)
>>>>>>> REPLACE
</source>

aider/commands.py
<source>python
<<<<<<< SEARCH
            tokens = len(self.tokenizer.encode(content))
=======
            tokens = self.coder.main_model.token_count(content)
>>>>>>> REPLACE
</source>

These changes will ensure that the `cmd_tokens` method uses the `token_count` method from the `main_model` to calculate the number of tokens, which is a more abstracted and potentially model-specific way of counting tokens.
2023-11-14 10:04:00 -08:00
Paul Gauthier
c63885fe8c More rules, instructions 2023-11-13 15:28:01 -08:00
Paul Gauthier
8a3d4bf14c added second example showing new file 2023-11-13 15:07:15 -08:00
Paul Gauthier
50f457d753 prompt with clear example using multiple S/R blocks on the same file 2023-11-13 14:59:19 -08:00
Paul Gauthier
f00c7922d5 Merge remote-tracking branch 'origin/main' 2023-11-09 08:13:36 -08:00
Paul Gauthier
a9219cb9de copy 2023-11-09 08:11:06 -08:00
Paul Gauthier
8f5669b9ca copy 2023-11-09 07:43:19 -08:00
Paul Gauthier
8f8a32f325 aider: Fix NameError in show_stats function by passing graphs as a parameter.
# Aider chat conversation:

USER: I ran this command:

 ./benchmark/benchmark.py tmp.benchmarks/2023-11-07-02-37-08--gpt-3.5-turbo-1106 --stats

And got this output:

─────────────────────────────── tmp.benchmarks/2023-11-07-02-37-08--gpt-3.5-turbo-1106 ───────────────────────────────
test-cases: 133
model: gpt-3.5-turbo-1106
edit_format: whole
commit_hash: f658a65-dirty
num_error_outputs: 2
num_user_asks: 2
num_exhausted_context_windows 0
test_timeouts: 2

41.4% correct after try 0
55.6% correct after try 1

duration: 3.1 sec/test-case
costs: $0.0037/test-case, $0.49 total, $0.49 projected
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Traceback (most recent call last):
  File "/Users/gauthier/Projects/aider/./benchmark/benchmark.py", line 772, in <module>
    app()
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/typer/main.py", line 328, in __call__
    raise e
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/typer/core.py", line 716, in main
    return _main(
           ^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/./benchmark/benchmark.py", line 353, in main
    return show_stats(updated_dirnames)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/./benchmark/benchmark.py", line 124, in show_stats
    if graphs:
       ^^^^^^
NameError: name 'graphs' is not defined

ASSISTANT: It appears that the `graphs` variable is being used in the `show_stats` function without being defined or passed as an argument to the function. To fix this issue, we need to modify the `show_stats` function to accept `graphs` as a parameter and then pass the `graphs` variable when the function is called from the `main` function.

Here are the steps to fix the issue:

1. Modify the `show_stats` function definition to accept a new parameter `graphs`.
2. Update the call to `show_stats` in the `main` function to pass the `graphs` variable.

Let's make these changes in the `benchmark/benchmark.py` file.

<source>python
benchmark/benchmark.py
<<<<<<< SEARCH
def show_stats(dirnames):
=======
def show_stats(dirnames, graphs):
>>>>>>> REPLACE
</source>

<source>python
benchmark/benchmark.py
<<<<<<< SEARCH
return show_stats(updated_dirnames)
=======
return show_stats(updated_dirnames, graphs)
>>>>>>> REPLACE
</source>

Please apply these changes to the `benchmark/benchmark.py` file to resolve the `NameError`.
2023-11-08 14:15:08 -08:00
Paul Gauthier
9351f438ac aider: Add a cmd line switch --graphs and only draw them if it's provided.
# Aider chat conversation:

USER: add a cmd line switch --graphs and only draw them if its provided

ASSISTANT: To add a command-line switch `--graphs` and conditionally generate graphs based on its presence, we need to modify the `benchmark.py` file. Specifically, we will add a new option to the `main` function to accept the `--graphs` switch and then use this option to control the calls to `plot_timing` and `plot_outcomes` functions.

Here are the steps we will follow:

1. Add the `--graphs` option to the `typer` command-line interface in the `main` function.
2. Use the value of this option to conditionally call the `plot_timing` and `plot_outcomes` functions.

Let's make these changes:

<source>python
benchmark/benchmark.py
<<<<<<< SEARCH
@app.command()
def main(
    dirnames: List[str] = typer.Argument(..., help="Directory names"),
=======
@app.command()
def main(
    dirnames: List[str] = typer.Argument(..., help="Directory names"),
    graphs: bool = typer.Option(False, "--graphs", help="Generate graphs"),
>>>>>>> REPLACE

...

<<<<<<< SEARCH
    plot_timing(df)
    plot_outcomes(df, repeats, repeat_hi, repeat_lo, repeat_avg)
=======
    if graphs:
        plot_timing(df)
        plot_outcomes(df, repeats, repeat_hi, repeat_lo, repeat_avg)
>>>>>>> REPLACE
</source>

With these changes, the `--graphs` switch will be available when running the `benchmark.py` script, and the graph generation will only occur if this switch is provided.
2023-11-08 14:13:59 -08:00
paul-gauthier
a4e224e190
Update benchmarks-speed-1106.md 2023-11-08 12:30:56 -08:00
paul-gauthier
d455ab7b21
Update benchmarks-speed-1106.md 2023-11-08 12:26:12 -08:00
Paul Gauthier
8f06658436 copy 2023-11-08 12:18:39 -08:00
Paul Gauthier
e394ac7c71 copy 2023-11-08 11:51:07 -08:00
Paul Gauthier
c65c3722d3 copy 2023-11-08 11:44:13 -08:00
Paul Gauthier
7c92370a3b copy 2023-11-08 11:34:34 -08:00
Paul Gauthier
200414cee0 copy 2023-11-08 11:19:36 -08:00
Paul Gauthier
ed0dc95a10 copy 2023-11-08 11:16:56 -08:00