Paul Gauthier
|
60d11a6eba
|
use LONG_TIMEOUT
|
2025-02-24 13:51:21 -08:00 |
|
Paul Gauthier
|
6118d91922
|
improve unit tests in benchmark
|
2025-02-06 16:27:29 -08:00 |
|
Paul Gauthier (aider)
|
0336a982ff
|
feat: Add model settings loading and registration to benchmark script
|
2025-01-28 09:39:39 -08:00 |
|
Paul Gauthier (aider)
|
aa18b63c16
|
refactor: Simplify model settings loading in benchmark script
|
2025-01-28 09:38:57 -08:00 |
|
Paul Gauthier (aider)
|
3f890551e7
|
fix: Add missing read_model_settings parameter to run_test_real function
|
2025-01-28 09:33:14 -08:00 |
|
Paul Gauthier (aider)
|
823127c87e
|
style: Apply linter formatting to benchmark.py
|
2025-01-28 09:32:55 -08:00 |
|
Paul Gauthier (aider)
|
cf2c9c6dc7
|
feat: Add --read-model-settings option to benchmark for loading model settings
|
2025-01-28 09:32:46 -08:00 |
|
Paul Gauthier
|
9b63b90ec4
|
refactor: Remove unnecessary blank line in benchmark.py
|
2025-01-28 09:32:35 -08:00 |
|
Paul Gauthier
|
dff544cd5d
|
refactor: Split summarize method and add model metadata handling
|
2025-01-20 09:38:45 -08:00 |
|
Paul Gauthier
|
a08326ab60
|
enable all java tests
|
2025-01-15 15:18:46 -08:00 |
|
Paul Gauthier
|
63cf99361d
|
ensure no loading of any other files
|
2025-01-15 13:57:54 -08:00 |
|
Nimesh Ghelani
|
ed9d70903d
|
Fix files not being excluded in benchmark.py
`.discard()` removes an item from the set. `.difference_update()` is the
correct call here.
|
2025-01-07 17:35:29 +00:00 |
|
Paul Gauthier (aider)
|
c5919f0c15
|
refactor: improve cleanup error handling and verbose logging
|
2025-01-04 10:55:11 -08:00 |
|
Paul Gauthier
|
ac160cac12
|
chore: Ignore exceptions during Rust target directory cleanup
|
2025-01-04 10:55:09 -08:00 |
|
Paul Gauthier (aider)
|
729354b038
|
chore: Add cleanup for node_modules directories in benchmark tests
|
2025-01-03 14:19:06 -05:00 |
|
Paul Gauthier (aider)
|
c0be857f37
|
chore: Add Java build directory cleanup to test runner
|
2025-01-03 14:16:51 -05:00 |
|
Paul Gauthier
|
98b0e88ace
|
refactor: simplify Rust target directory cleanup logic
|
2025-01-03 14:16:49 -05:00 |
|
Paul Gauthier (aider)
|
3d501df21f
|
chore: Clean up Rust target/debug directory after all test attempts
|
2025-01-03 14:14:44 -05:00 |
|
Paul Gauthier
|
1b4abb747d
|
style: Add blank line for readability in benchmark.py
|
2025-01-03 14:14:42 -05:00 |
|
Paul Gauthier (aider)
|
f035c4c01a
|
fix: Remove max_apply_update_errors from threaded call
|
2024-12-27 16:36:58 -04:00 |
|
Paul Gauthier (aider)
|
8fcdcecf36
|
refactor: Remove deprecated max_apply_update_errors
|
2024-12-27 16:36:47 -04:00 |
|
Paul Gauthier
|
3f9ee1ac2e
|
refactor: Remove deprecated max_apply_update_errors
|
2024-12-27 16:36:46 -04:00 |
|
Paul Gauthier
|
188e1f788d
|
chore: Rename exercism dir to polyglot-benchmark
|
2024-12-27 16:33:04 -04:00 |
|
Paul Gauthier (aider)
|
a75507980a
|
fix: Pass stats_languages to summarize_results and show_stats
|
2024-12-20 16:04:00 -08:00 |
|
Paul Gauthier (aider)
|
8d0decc17a
|
style: Apply linter formatting
|
2024-12-20 16:03:44 -08:00 |
|
Paul Gauthier (aider)
|
e334cbb5d4
|
fix: Correct indentation in load_results function
|
2024-12-20 16:03:40 -08:00 |
|
Paul Gauthier (aider)
|
e3ac8ab19d
|
feat: Add --stats-languages option to filter results
|
2024-12-20 16:03:19 -08:00 |
|
Paul Gauthier
|
bddf6e9017
|
fix: Handle missing attributes in show_stats and empty models
|
2024-12-20 16:03:19 -08:00 |
|
Paul Gauthier
|
521841b447
|
fix: Skip redoing tests if results exist
|
2024-12-19 16:25:54 -08:00 |
|
Paul Gauthier (aider)
|
c53cd336f9
|
style: Fix linting issues
|
2024-12-19 15:59:03 -08:00 |
|
Paul Gauthier (aider)
|
a8226989c8
|
feat: Remove @Disabled annotations from Java test files
|
2024-12-19 15:58:59 -08:00 |
|
Paul Gauthier
|
114b156d74
|
fix: Use relative paths for ignored files, remove redundant try
|
2024-12-19 15:56:16 -08:00 |
|
Paul Gauthier (aider)
|
370b45bb35
|
feat: Ignore files in .meta and .docs directories
|
2024-12-19 07:23:28 -08:00 |
|
Paul Gauthier
|
616c4a9a53
|
chore: Add comment about ignoring meta and docs files
|
2024-12-19 07:23:27 -08:00 |
|
Paul Gauthier
|
821f7d6694
|
fix: Use extra_body for reasoning_effort, fix test counts
|
2024-12-19 07:10:20 -08:00 |
|
Paul Gauthier
|
c36c06ab99
|
fix: Retry tests on parse or timeout, add gpt-4o params
|
2024-12-18 15:56:38 -08:00 |
|
Paul Gauthier
|
a915c60999
|
feat: Add pass_num to benchmark results, fix hard set percent
|
2024-12-18 13:36:37 -08:00 |
|
Paul Gauthier
|
2aa4615c78
|
feat: Add openrouter/openai/o1 model and update prompts
|
2024-12-18 06:59:14 -08:00 |
|
Paul Gauthier (aider)
|
7dd1346878
|
fix: Remove stray ] causing syntax error
|
2024-12-17 20:34:33 -08:00 |
|
Paul Gauthier (aider)
|
31f8c7d9cb
|
fix: Handle JSON decode errors when loading results
|
2024-12-17 20:34:21 -08:00 |
|
Paul Gauthier
|
914ce0b94d
|
feat: Add total_tests to summary, handle JSON decode errors
|
2024-12-17 20:34:20 -08:00 |
|
Paul Gauthier (aider)
|
664f09111e
|
feat: Pass original_dname to tests, copy test files
|
2024-12-17 20:11:58 -08:00 |
|
Paul Gauthier (aider)
|
6141f414fd
|
chore: Remove comment from run_unit_tests
|
2024-12-17 20:11:29 -08:00 |
|
Paul Gauthier
|
8911f0f217
|
fix: Correctly find benchmark markdown files
|
2024-12-17 20:11:29 -08:00 |
|
Paul Gauthier (aider)
|
5af108ccee
|
style: Format benchmark code with black
|
2024-12-17 20:01:45 -08:00 |
|
Paul Gauthier (aider)
|
94e4169445
|
fix: Update stats to handle nested exercise directories
|
2024-12-17 20:01:40 -08:00 |
|
Paul Gauthier
|
479b5b7064
|
fix: Use shell=True for npm test and fix path
|
2024-12-17 20:01:39 -08:00 |
|
Paul Gauthier
|
12491c4983
|
wip
|
2024-12-17 17:47:17 -08:00 |
|
Paul Gauthier (aider)
|
77d379c021
|
refactor: Use full path for test names in benchmark
|
2024-12-17 17:43:52 -08:00 |
|
Paul Gauthier (aider)
|
1a12a59e91
|
chore: Remove comment about test_dnames
|
2024-12-17 17:41:29 -08:00 |
|