Commit graph

224 commits

Author SHA1 Message Date
Paul Gauthier (aider)
351b8e50f0 feat: add --num-ctx flag to override model context window size 2024-11-25 19:20:43 -08:00
fry69
667a58052e feat: change edit format from "senior" to "architect" 2024-09-27 09:03:42 +02:00
fry69
e3e0d57512 chore: update parameter names in args and benchmark 2024-09-27 08:57:22 +02:00
Paul Gauthier
eb21cf2830 architect/editor 2024-09-26 16:10:19 -07:00
Paul Gauthier (aider)
5a78e7d1b8 chore: Run the linter 2024-09-26 11:35:13 -07:00
Paul Gauthier (aider)
1c05192b69 fix: Only record junior_model and junior_edit_format in the results array if edit_format is "senior" 2024-09-26 11:35:09 -07:00
Paul Gauthier
e682eb8669 fix: Add junior model and junior edit format to benchmark results 2024-09-25 16:31:40 -07:00
Paul Gauthier (aider)
ed7503dbbe feat: optimize find_latest_benchmark_dir to check only .md files and limit to one file per subtree 2024-09-25 12:20:45 -07:00
Paul Gauthier (aider)
e21cdafb15 style: run linter and fix code formatting issues 2024-09-25 12:18:43 -07:00
Paul Gauthier (aider)
8d90df1ebc feat: implement automatic selection of the most recently updated benchmark directory when using --stats without dirnames 2024-09-25 12:18:39 -07:00
Paul Gauthier (aider)
24c959af2d feat: Add --junior-model and --junior-edit-format flags to the benchmark 2024-09-25 11:44:34 -07:00
Paul Gauthier
15cc709322 feat: Improve senior coder's edit format handling 2024-09-25 11:42:09 -07:00
Paul Gauthier
65e57df7ea feat: Implement changes to handle files content in Coder and prompts 2024-09-25 09:54:16 -07:00
Paul Gauthier
075bc828f6 stand alone junior message 2024-09-25 08:41:49 -07:00
Paul Gauthier
c912982747 senior-junior 2024-09-25 08:25:11 -07:00
Paul Gauthier
a9e9f9cdbe Merge branch 'main' into ask-plan-simple 2024-09-25 07:46:15 -07:00
Paul Gauthier
412b8e7c3c copy 2024-09-21 10:09:26 -07:00
Paul Gauthier
2753ac6b62 feat: Add new benchmark test case for qwen-2.5-72b-instruct-diff model 2024-09-20 13:27:58 -07:00
Paul Gauthier
8cb83afcc4 ask transient whole, o1-preview deep 2024-09-12 17:21:35 -07:00
Paul Gauthier
83662b7470 Merge branch 'main' into ask-plan-simple 2024-09-12 17:19:14 -07:00
Paul Gauthier
1fbb5079d5 unhack o1 mini 2024-09-12 15:38:28 -07:00
Paul Gauthier
291b456a45 hack for o1-mini: no system prompt, no temperature 2024-09-12 13:05:25 -07:00
Paul Gauthier
5408dcb185 wip 2024-09-11 09:32:14 -07:00
Paul Gauthier
39ae106bb3 wip 2024-09-10 15:21:54 -07:00
Paul Gauthier
abd484bfa7 wip 2024-09-06 12:01:51 -07:00
Paul Gauthier
cc15909629 clean diff edit format 2024-09-06 11:25:20 -07:00
Paul Gauthier
5b584db90c sonnet-sonnet gets 60.2/84.2 2024-09-06 09:49:01 -07:00
Paul Gauthier
1c73e7d43a turn off suggest shell commands during benchmarks 2024-09-05 14:35:34 -07:00
Paul Gauthier
05dcbeecac noop 2024-09-05 14:25:09 -07:00
Paul Gauthier
ff3a75413b sonnet+deep got 60.9/82.0 2024-09-05 13:30:25 -07:00
Paul Gauthier
1a3d8c4015 wip 2024-08-20 17:45:40 -07:00
Paul Gauthier
b61b5f4b74 cleanup before merge 2024-08-16 11:35:30 -07:00
Paul Gauthier
bac04a2a3d no lint 2024-08-15 06:10:46 -07:00
Paul Gauthier
060c8ff89a override dotenv 2024-08-13 18:06:00 -07:00
Paul Gauthier
139f7992cb do not pass pretty to coder 2024-08-13 17:43:41 -07:00
Paul Gauthier
ca18220b77 num_with_malformed_responses 2024-05-19 14:19:06 -07:00
Paul Gauthier
70b1c0c20c load .env in benchmark.py 2024-05-07 13:32:19 -07:00
Paul Gauthier
ecca737803 added deepseek-chat v2 2024-05-07 06:26:39 -07:00
Paul Gauthier
b1cae73a85 cleaned up csv output 2024-05-07 05:59:31 -07:00
Paul Gauthier
a7b08c7354 format output as yaml 2024-05-06 11:15:19 -07:00
Paul Gauthier
3162d42262 cleanup 2024-05-06 10:46:09 -07:00
Paul Gauthier
5fb7a323ec refactored plots 2024-05-06 10:44:34 -07:00
Paul Gauthier
3bb237bdc1 handle tasks with exceptions in the stats output 2024-05-05 08:24:45 -07:00
Paul Gauthier
9cdd9e12c3 catch all exceptions in the benchmark 2024-05-04 17:52:46 -07:00
Paul Gauthier
9b88f8caf6 updated gpt-4-0314 2024-05-04 07:59:27 -07:00
Paul Gauthier
f6580fff76 updated all openai models 2024-05-04 07:38:50 -07:00
Paul Gauthier
1981105932 aider: Implemented the TODO to extract the version from aider/__init__.py for each commit hash.
# Aider chat conversation:

USER: do the todo

ASSISTANT: Here is the *SEARCH/REPLACE block* to do the TODO in the code:

benchmark/benchmark.py
<source>python
<<<<<<< SEARCH
    for hsh in variants['commit_hash']):
        # TODO: get the output from `git show {hsh}:aider/__init__.py`
=======
    for hsh in variants['commit_hash']:
        try:
            version = subprocess.check_output(
                ['git', 'show', f'{hsh}:aider/__init__.py'],
                universal_newlines=True
            )
            version = re.search(r'__version__ = "(.*)"', version).group(1)
            csv.append(version)
        except subprocess.CalledProcessError:
            csv.append('unknown')
>>>>>>> REPLACE
</source>

This change:
1. Loops through each commit hash in `variants['commit_hash']`
2. Uses `subprocess.check_output` to run the `git show` command and capture the contents of `aider/__init__.py` at that commit
3. Searches the file contents for the `__version__ = "X.Y.Z"` line and extracts the version string
4. Appends the version to the `csv` list, or `'unknown'` if there was an error getting the file contents

The `try/except` handles cases where the commit hash doesn't exist or `aider/__init__.py` is missing.
2024-05-04 07:14:23 -07:00
Paul Gauthier
01282674d4 Add pass rates to CSV output in benchmark results summary. 2024-05-04 07:13:40 -07:00
Paul Gauthier
4461c7c4b2 fixed benchmark 2024-04-23 09:44:04 -07:00
Paul Gauthier
fd5b9bbfcb Added groq llama3 2024-04-22 07:12:01 -07:00