Commit graph

247 commits

Author SHA1 Message Date
Paul Gauthier (aider)
f9646ac47a chore: Remove comment about practice subdirs 2024-12-17 17:35:17 -08:00
Paul Gauthier
e8ed3b9e23 chore: Add comment about copying practice subdirs 2024-12-17 17:35:16 -08:00
Paul Gauthier (aider)
6238a07c8f style: Run linter on benchmark.py 2024-12-17 17:33:28 -08:00
Paul Gauthier (aider)
1fb33f0c47 feat: Add language filter and multi-lang support 2024-12-17 17:33:23 -08:00
Paul Gauthier (aider)
a842f41627 style: Fix linting issues in benchmark.py 2024-12-17 16:49:50 -08:00
Paul Gauthier (aider)
c4c135e678 refactor: Use dict for test commands based on file extensions 2024-12-17 16:49:46 -08:00
Paul Gauthier (aider)
f36f2fdea2 style: Fix typo in test file extension check 2024-12-17 16:48:37 -08:00
Paul Gauthier (aider)
e3f0a67584 feat: Choose test command based on file extensions 2024-12-17 16:48:32 -08:00
Paul Gauthier
f6f05fa0c6 fix: Use cargo test for rust tests 2024-12-17 16:48:31 -08:00
Paul Gauthier (aider)
cf5b38d4f5 style: Fix linting issues in benchmark.py 2024-12-17 16:35:20 -08:00
Paul Gauthier (aider)
b23669400f fix: Correct syntax error in cleanup_test_output 2024-12-17 16:35:16 -08:00
Paul Gauthier
aaacd00ecf refactor: Use pytest instead of unittest for running tests 2024-12-17 16:35:08 -08:00
Paul Gauthier (aider)
03aa22ba84 feat: Read config.json, copy solution/test files, no fallback 2024-12-17 16:18:10 -08:00
Paul Gauthier
1493b8703f fix: Skip unparseable results files in real test 2024-12-17 16:18:09 -08:00
Paul Gauthier
4dc3b9072e feat: increase retry timeout for benchmarking 2024-12-11 14:26:28 -08:00
Paul Gauthier (aider)
fcb2bacd1e style: format benchmark.py with black 2024-12-11 13:09:52 -08:00
Paul Gauthier (aider)
a9401e921e feat: add sleep option between tests in single-threaded mode 2024-12-11 13:09:45 -08:00
Paul Gauthier (aider)
6af71951af style: fix whitespace in benchmark.py 2024-11-28 14:01:50 -08:00
Paul Gauthier (aider)
3eed45dc3e fix: improve benchmark directory selection based on latest .md file timestamp 2024-11-28 14:01:45 -08:00
Paul Gauthier (aider)
320b059bc7 perf: optimize benchmark dir search by filtering on timestamp first 2024-11-28 14:00:12 -08:00
Paul Gauthier
a89ce06377 fix: correct glob pattern for finding latest benchmark directory 2024-11-28 14:00:10 -08:00
Paul Gauthier (aider)
2ff3a23606 fix: add num_ctx parameter to run_test_real function 2024-11-25 19:21:08 -08:00
Paul Gauthier (aider)
c5ce57ea7f style: fix linting issues in benchmark.py 2024-11-25 19:20:49 -08:00
Paul Gauthier (aider)
351b8e50f0 feat: add --num-ctx flag to override model context window size 2024-11-25 19:20:43 -08:00
fry69
667a58052e feat: change edit format from "senior" to "architect" 2024-09-27 09:03:42 +02:00
fry69
e3e0d57512 chore: update parameter names in args and benchmark 2024-09-27 08:57:22 +02:00
Paul Gauthier
eb21cf2830 architect/editor 2024-09-26 16:10:19 -07:00
Paul Gauthier (aider)
5a78e7d1b8 chore: Run the linter 2024-09-26 11:35:13 -07:00
Paul Gauthier (aider)
1c05192b69 fix: Only record junior_model and junior_edit_format in the results array if edit_format is "senior" 2024-09-26 11:35:09 -07:00
Paul Gauthier
e682eb8669 fix: Add junior model and junior edit format to benchmark results 2024-09-25 16:31:40 -07:00
Paul Gauthier (aider)
ed7503dbbe feat: optimize find_latest_benchmark_dir to check only .md files and limit to one file per subtree 2024-09-25 12:20:45 -07:00
Paul Gauthier (aider)
e21cdafb15 style: run linter and fix code formatting issues 2024-09-25 12:18:43 -07:00
Paul Gauthier (aider)
8d90df1ebc feat: implement automatic selection of the most recently updated benchmark directory when using --stats without dirnames 2024-09-25 12:18:39 -07:00
Paul Gauthier (aider)
24c959af2d feat: Add --junior-model and --junior-edit-format flags to the benchmark 2024-09-25 11:44:34 -07:00
Paul Gauthier
15cc709322 feat: Improve senior coder's edit format handling 2024-09-25 11:42:09 -07:00
Paul Gauthier
65e57df7ea feat: Implement changes to handle files content in Coder and prompts 2024-09-25 09:54:16 -07:00
Paul Gauthier
075bc828f6 stand alone junior message 2024-09-25 08:41:49 -07:00
Paul Gauthier
c912982747 senior-junior 2024-09-25 08:25:11 -07:00
Paul Gauthier
a9e9f9cdbe Merge branch 'main' into ask-plan-simple 2024-09-25 07:46:15 -07:00
Paul Gauthier
412b8e7c3c copy 2024-09-21 10:09:26 -07:00
Paul Gauthier
2753ac6b62 feat: Add new benchmark test case for qwen-2.5-72b-instruct-diff model 2024-09-20 13:27:58 -07:00
Paul Gauthier
8cb83afcc4 ask transient whole, o1-preview deep 2024-09-12 17:21:35 -07:00
Paul Gauthier
83662b7470 Merge branch 'main' into ask-plan-simple 2024-09-12 17:19:14 -07:00
Paul Gauthier
1fbb5079d5 unhack o1 mini 2024-09-12 15:38:28 -07:00
Paul Gauthier
291b456a45 hack for o1-mini: no system prompt, no temperature 2024-09-12 13:05:25 -07:00
Paul Gauthier
5408dcb185 wip 2024-09-11 09:32:14 -07:00
Paul Gauthier
39ae106bb3 wip 2024-09-10 15:21:54 -07:00
Paul Gauthier
abd484bfa7 wip 2024-09-06 12:01:51 -07:00
Paul Gauthier
cc15909629 clean diff edit format 2024-09-06 11:25:20 -07:00
Paul Gauthier
5b584db90c sonnet-sonnet gets 60.2/84.2 2024-09-06 09:49:01 -07:00