AJ (@techfren)
|
e7b2514c07
|
Merge 3a93da8f8d into 3caab85931
|
2025-05-13 15:03:58 -07:00 |
|
Paul Gauthier
|
7f30320566
|
chore: Disable pretty printing in benchmark I/O
|
2025-05-09 10:07:21 -07:00 |
|
MDW
|
40a5a88d56
|
style: remove or ignore unused imports
The following files had unused imports removed:
- `scripts/30k-image.py`
- `scripts/dl_icons.py`
- `scripts/redact-cast.py`
|
2025-05-08 23:29:18 +02:00 |
|
Paul Gauthier (aider)
|
5090f28151
|
feat: Track total tokens and use in benchmark stats
|
2025-05-07 21:08:29 -07:00 |
|
Paul Gauthier (aider)
|
a98b531bcc
|
feat: add prompt_tokens and completion_tokens to results summary
|
2025-05-07 21:02:00 -07:00 |
|
AJ
|
3a93da8f8d
|
Add architect mode information to benchmark README
|
2025-04-25 17:48:08 -07:00 |
|
AJ
|
cbd744df0e
|
Remove retry tracking and display from benchmark
|
2025-04-25 10:15:31 -07:00 |
|
AJ
|
d8e511ea2f
|
add ability to pause and resume benchmark
|
2025-04-25 09:59:48 -07:00 |
|
AJ
|
35fed777db
|
update language print
|
2025-04-24 19:53:06 -07:00 |
|
AJ
|
04abec4c10
|
Add enhanced benchmark metrics including API calls, retries, and language-specific pass rates
|
2025-04-24 10:06:31 -07:00 |
|
Paul Gauthier
|
c94340d493
|
less ram
|
2025-04-20 13:18:57 -07:00 |
|
Paul Gauthier (aider)
|
1a4d3927e7
|
feat: Add --thinking-tokens option to benchmark script
|
2025-04-20 11:29:33 -07:00 |
|
Paul Gauthier
|
622bf349c5
|
chore: Add num_ctx and sleep to run_test_threaded.gather arguments
|
2025-04-17 20:08:57 -07:00 |
|
Paul Gauthier (aider)
|
05eaf82b36
|
feat: Pass verbose flag to Model class for detailed output
|
2025-04-17 20:02:31 -07:00 |
|
Paul Gauthier (aider)
|
5c8150fd16
|
fix: Change reasoning_effort type to string in benchmark script
|
2025-04-17 20:02:09 -07:00 |
|
Paul Gauthier (aider)
|
ec9327dcb4
|
style: Apply linter to benchmark.py
|
2025-04-17 20:01:30 -07:00 |
|
Paul Gauthier (aider)
|
8e689d35af
|
Feat: Add --reasoning-effort switch to benchmark script
|
2025-04-17 20:01:26 -07:00 |
|
Paul Gauthier
|
2dec862ea6
|
copy
|
2025-04-01 17:08:27 +13:00 |
|
AJ (@techfren)
|
587186d96c
|
Update benchmark README.md to specify how to config other settings
|
2025-03-31 17:05:53 -07:00 |
|
Brian Exelbierd
|
a56dbdf502
|
Update Benchmark README.md
Use a consistent clone url to help those who don't use ssh with GitHub. This should not break for those who do.
|
2025-03-14 09:05:04 +01:00 |
|
Paul Gauthier (aider)
|
976722c129
|
refactor: Update problem_stats.py to use polyglot_leaderboard.yml
|
2025-02-27 08:56:54 -08:00 |
|
Paul Gauthier
|
60d11a6eba
|
use LONG_TIMEOUT
|
2025-02-24 13:51:21 -08:00 |
|
Paul Gauthier
|
739a88ed00
|
Add -DEXERCISM_RUN_ALL_TESTS to cpp tests
|
2025-02-06 16:41:59 -08:00 |
|
Paul Gauthier (aider)
|
38d4341e59
|
build: Add libboost-all-dev to Dockerfile for C++ support
|
2025-02-06 16:41:45 -08:00 |
|
Paul Gauthier
|
6118d91922
|
improve unit tests in benchmark
|
2025-02-06 16:27:29 -08:00 |
|
Paul Gauthier
|
550b9ebf4d
|
limit benchmark docker memory
|
2025-02-05 16:40:03 -08:00 |
|
Paul Gauthier
|
cd16e001f6
|
verbose
|
2025-01-28 11:52:07 -08:00 |
|
Paul Gauthier
|
8a3cc6041d
|
sync model settings
|
2025-01-28 10:49:21 -08:00 |
|
Paul Gauthier (aider)
|
0336a982ff
|
feat: Add model settings loading and registration to benchmark script
|
2025-01-28 09:39:39 -08:00 |
|
Paul Gauthier (aider)
|
aa18b63c16
|
refactor: Simplify model settings loading in benchmark script
|
2025-01-28 09:38:57 -08:00 |
|
Paul Gauthier (aider)
|
3f890551e7
|
fix: Add missing read_model_settings parameter to run_test_real function
|
2025-01-28 09:33:14 -08:00 |
|
Paul Gauthier (aider)
|
823127c87e
|
style: Apply linter formatting to benchmark.py
|
2025-01-28 09:32:55 -08:00 |
|
Paul Gauthier (aider)
|
cf2c9c6dc7
|
feat: Add --read-model-settings option to benchmark for loading model settings
|
2025-01-28 09:32:46 -08:00 |
|
Paul Gauthier
|
9b63b90ec4
|
refactor: Remove unnecessary blank line in benchmark.py
|
2025-01-28 09:32:35 -08:00 |
|
Paul Gauthier
|
b276d48ecf
|
copy
|
2025-01-24 18:36:01 -08:00 |
|
Paul Gauthier
|
dff544cd5d
|
refactor: Split summarize method and add model metadata handling
|
2025-01-20 09:38:45 -08:00 |
|
Paul Gauthier
|
a08326ab60
|
enable all java tests
|
2025-01-15 15:18:46 -08:00 |
|
Paul Gauthier
|
63cf99361d
|
ensure no loading of any other files
|
2025-01-15 13:57:54 -08:00 |
|
Paul Gauthier
|
1e54ca82b8
|
refactor: encapsulate rsync logic in function and add continuous sync loop
|
2025-01-13 15:47:49 -08:00 |
|
Nimesh Ghelani
|
ed9d70903d
|
Fix files not being excluded in benchmark.py
`.discard()` removes an item from the set. `.difference_update()` is the
correct call here.
|
2025-01-07 17:35:29 +00:00 |
|
Paul Gauthier (aider)
|
c5919f0c15
|
refactor: improve cleanup error handling and verbose logging
|
2025-01-04 10:55:11 -08:00 |
|
Paul Gauthier
|
ac160cac12
|
chore: Ignore exceptions during Rust target directory cleanup
|
2025-01-04 10:55:09 -08:00 |
|
Paul Gauthier (aider)
|
729354b038
|
chore: Add cleanup for node_modules directories in benchmark tests
|
2025-01-03 14:19:06 -05:00 |
|
Paul Gauthier (aider)
|
c0be857f37
|
chore: Add Java build directory cleanup to test runner
|
2025-01-03 14:16:51 -05:00 |
|
Paul Gauthier
|
98b0e88ace
|
refactor: simplify Rust target directory cleanup logic
|
2025-01-03 14:16:49 -05:00 |
|
Paul Gauthier (aider)
|
3d501df21f
|
chore: Clean up Rust target/debug directory after all test attempts
|
2025-01-03 14:14:44 -05:00 |
|
Paul Gauthier
|
1b4abb747d
|
style: Add blank line for readability in benchmark.py
|
2025-01-03 14:14:42 -05:00 |
|
Paul Gauthier
|
f292e01980
|
Merge branch 'main' of github.com:Aider-AI/aider
|
2024-12-30 14:37:27 -04:00 |
|
Josh Vera
|
e486243c06
|
Install ca-certificates before openjdk-21 to resolve cacerts error
|
2024-12-29 10:55:09 -08:00 |
|
Paul Gauthier (aider)
|
8eaefb57d3
|
feat: Add RevCumulative column to problem stats
|
2024-12-28 11:45:41 -04:00 |
|