Commit graph

224 commits

Author SHA1 Message Date
Paul Gauthier (aider)
e334cbb5d4 fix: Correct indentation in load_results function 2024-12-20 16:03:40 -08:00
Paul Gauthier (aider)
e3ac8ab19d feat: Add --stats-languages option to filter results 2024-12-20 16:03:19 -08:00
Paul Gauthier
bddf6e9017 fix: Handle missing attributes in show_stats and empty models 2024-12-20 16:03:19 -08:00
Paul Gauthier
521841b447 fix: Skip redoing tests if results exist 2024-12-19 16:25:54 -08:00
Paul Gauthier (aider)
c53cd336f9 style: Fix linting issues 2024-12-19 15:59:03 -08:00
Paul Gauthier (aider)
a8226989c8 feat: Remove @Disabled annotations from Java test files 2024-12-19 15:58:59 -08:00
Paul Gauthier
114b156d74 fix: Use relative paths for ignored files, remove redundant try 2024-12-19 15:56:16 -08:00
Paul Gauthier (aider)
370b45bb35 feat: Ignore files in .meta and .docs directories 2024-12-19 07:23:28 -08:00
Paul Gauthier
616c4a9a53 chore: Add comment about ignoring meta and docs files 2024-12-19 07:23:27 -08:00
Paul Gauthier
821f7d6694 fix: Use extra_body for reasoning_effort, fix test counts 2024-12-19 07:10:20 -08:00
Paul Gauthier
c36c06ab99 fix: Retry tests on parse or timeout, add gpt-4o params 2024-12-18 15:56:38 -08:00
Paul Gauthier
a915c60999 feat: Add pass_num to benchmark results, fix hard set percent 2024-12-18 13:36:37 -08:00
Paul Gauthier
2aa4615c78 feat: Add openrouter/openai/o1 model and update prompts 2024-12-18 06:59:14 -08:00
Paul Gauthier (aider)
7dd1346878 fix: Remove stray ] causing syntax error 2024-12-17 20:34:33 -08:00
Paul Gauthier (aider)
31f8c7d9cb fix: Handle JSON decode errors when loading results 2024-12-17 20:34:21 -08:00
Paul Gauthier
914ce0b94d feat: Add total_tests to summary, handle JSON decode errors 2024-12-17 20:34:20 -08:00
Paul Gauthier (aider)
664f09111e feat: Pass original_dname to tests, copy test files 2024-12-17 20:11:58 -08:00
Paul Gauthier (aider)
6141f414fd chore: Remove comment from run_unit_tests 2024-12-17 20:11:29 -08:00
Paul Gauthier
8911f0f217 fix: Correctly find benchmark markdown files 2024-12-17 20:11:29 -08:00
Paul Gauthier (aider)
5af108ccee style: Format benchmark code with black 2024-12-17 20:01:45 -08:00
Paul Gauthier (aider)
94e4169445 fix: Update stats to handle nested exercise directories 2024-12-17 20:01:40 -08:00
Paul Gauthier
479b5b7064 fix: Use shell=True for npm test and fix path 2024-12-17 20:01:39 -08:00
Paul Gauthier
12491c4983 wip 2024-12-17 17:47:17 -08:00
Paul Gauthier (aider)
77d379c021 refactor: Use full path for test names in benchmark 2024-12-17 17:43:52 -08:00
Paul Gauthier (aider)
1a12a59e91 chore: Remove comment about test_dnames 2024-12-17 17:41:29 -08:00
Paul Gauthier
0b970dd9c7 fix: Ensure test_dnames include full path 2024-12-17 17:41:27 -08:00
Paul Gauthier (aider)
93ac2bd53e feat: Copy only practice subdirs with exercises 2024-12-17 17:36:03 -08:00
Paul Gauthier (aider)
f9646ac47a chore: Remove comment about practice subdirs 2024-12-17 17:35:17 -08:00
Paul Gauthier
e8ed3b9e23 chore: Add comment about copying practice subdirs 2024-12-17 17:35:16 -08:00
Paul Gauthier (aider)
6238a07c8f style: Run linter on benchmark.py 2024-12-17 17:33:28 -08:00
Paul Gauthier (aider)
1fb33f0c47 feat: Add language filter and multi-lang support 2024-12-17 17:33:23 -08:00
Paul Gauthier (aider)
a842f41627 style: Fix linting issues in benchmark.py 2024-12-17 16:49:50 -08:00
Paul Gauthier (aider)
c4c135e678 refactor: Use dict for test commands based on file extensions 2024-12-17 16:49:46 -08:00
Paul Gauthier (aider)
f36f2fdea2 style: Fix typo in test file extension check 2024-12-17 16:48:37 -08:00
Paul Gauthier (aider)
e3f0a67584 feat: Choose test command based on file extensions 2024-12-17 16:48:32 -08:00
Paul Gauthier
f6f05fa0c6 fix: Use cargo test for rust tests 2024-12-17 16:48:31 -08:00
Paul Gauthier (aider)
cf5b38d4f5 style: Fix linting issues in benchmark.py 2024-12-17 16:35:20 -08:00
Paul Gauthier (aider)
b23669400f fix: Correct syntax error in cleanup_test_output 2024-12-17 16:35:16 -08:00
Paul Gauthier
aaacd00ecf refactor: Use pytest instead of unittest for running tests 2024-12-17 16:35:08 -08:00
Paul Gauthier (aider)
03aa22ba84 feat: Read config.json, copy solution/test files, no fallback 2024-12-17 16:18:10 -08:00
Paul Gauthier
1493b8703f fix: Skip unparseable results files in real test 2024-12-17 16:18:09 -08:00
Paul Gauthier
4dc3b9072e feat: increase retry timeout for benchmarking 2024-12-11 14:26:28 -08:00
Paul Gauthier (aider)
fcb2bacd1e style: format benchmark.py with black 2024-12-11 13:09:52 -08:00
Paul Gauthier (aider)
a9401e921e feat: add sleep option between tests in single-threaded mode 2024-12-11 13:09:45 -08:00
Paul Gauthier (aider)
6af71951af style: fix whitespace in benchmark.py 2024-11-28 14:01:50 -08:00
Paul Gauthier (aider)
3eed45dc3e fix: improve benchmark directory selection based on latest .md file timestamp 2024-11-28 14:01:45 -08:00
Paul Gauthier (aider)
320b059bc7 perf: optimize benchmark dir search by filtering on timestamp first 2024-11-28 14:00:12 -08:00
Paul Gauthier
a89ce06377 fix: correct glob pattern for finding latest benchmark directory 2024-11-28 14:00:10 -08:00
Paul Gauthier (aider)
2ff3a23606 fix: add num_ctx parameter to run_test_real function 2024-11-25 19:21:08 -08:00
Paul Gauthier (aider)
c5ce57ea7f style: fix linting issues in benchmark.py 2024-11-25 19:20:49 -08:00