Commit graph

531 commits

Author SHA1 Message Date
Paul Gauthier (aider)
77d379c021 refactor: Use full path for test names in benchmark 2024-12-17 17:43:52 -08:00
Paul Gauthier (aider)
1a12a59e91 chore: Remove comment about test_dnames 2024-12-17 17:41:29 -08:00
Paul Gauthier
0b970dd9c7 fix: Ensure test_dnames include full path 2024-12-17 17:41:27 -08:00
Paul Gauthier (aider)
93ac2bd53e feat: Copy only practice subdirs with exercises 2024-12-17 17:36:03 -08:00
Paul Gauthier (aider)
f9646ac47a chore: Remove comment about practice subdirs 2024-12-17 17:35:17 -08:00
Paul Gauthier
e8ed3b9e23 chore: Add comment about copying practice subdirs 2024-12-17 17:35:16 -08:00
Paul Gauthier (aider)
6238a07c8f style: Run linter on benchmark.py 2024-12-17 17:33:28 -08:00
Paul Gauthier (aider)
1fb33f0c47 feat: Add language filter and multi-lang support 2024-12-17 17:33:23 -08:00
Paul Gauthier (aider)
a842f41627 style: Fix linting issues in benchmark.py 2024-12-17 16:49:50 -08:00
Paul Gauthier (aider)
c4c135e678 refactor: Use dict for test commands based on file extensions 2024-12-17 16:49:46 -08:00
Paul Gauthier (aider)
f36f2fdea2 style: Fix typo in test file extension check 2024-12-17 16:48:37 -08:00
Paul Gauthier (aider)
e3f0a67584 feat: Choose test command based on file extensions 2024-12-17 16:48:32 -08:00
Paul Gauthier
f6f05fa0c6 fix: Use cargo test for rust tests 2024-12-17 16:48:31 -08:00
Paul Gauthier (aider)
54ca7ceac8 feat: Use buildpack-deps, python3.12, and rust in Dockerfile 2024-12-17 16:39:30 -08:00
Paul Gauthier (aider)
cf5b38d4f5 style: Fix linting issues in benchmark.py 2024-12-17 16:35:20 -08:00
Paul Gauthier (aider)
b23669400f fix: Correct syntax error in cleanup_test_output 2024-12-17 16:35:16 -08:00
Paul Gauthier
aaacd00ecf refactor: Use pytest instead of unittest for running tests 2024-12-17 16:35:08 -08:00
Paul Gauthier (aider)
03aa22ba84 feat: Read config.json, copy solution/test files, no fallback 2024-12-17 16:18:10 -08:00
Paul Gauthier
1493b8703f fix: Skip unparseable results files in real test 2024-12-17 16:18:09 -08:00
Paul Gauthier (aider)
59308c20c6 feat: Number exercises in the table 2024-12-17 14:15:40 -08:00
Paul Gauthier (aider)
cac5d8e716 style: Apply linter formatting 2024-12-17 14:15:06 -08:00
Paul Gauthier (aider)
7f16757bbe fix: Handle missing results in topn leaderboard calculation 2024-12-17 14:15:02 -08:00
Paul Gauthier (aider)
674e3846e2 fix: Correctly sort leaderboard by pass rate 2024-12-17 14:13:43 -08:00
Paul Gauthier (aider)
3a0be0cca9 style: Apply linter formatting 2024-12-17 14:13:19 -08:00
Paul Gauthier (aider)
00d7c3a05a feat: Add --topn argument to limit models by pass rate 2024-12-17 14:13:16 -08:00
Paul Gauthier (aider)
91f5fca5e9 feat: Include never solved exercises in stats 2024-12-17 14:10:47 -08:00
Paul Gauthier (aider)
1d7cb0c119 feat: Format problem stats output as a table with percentages 2024-12-17 14:10:00 -08:00
Paul Gauthier (aider)
24599aa64f style: Run linter on problem_stats.py 2024-12-17 14:09:20 -08:00
Paul Gauthier (aider)
54c1553892 refactor: Remove distribution of solutions table 2024-12-17 14:09:17 -08:00
Paul Gauthier (aider)
0ae53ce1a1 feat: Output per-exercise stats, sort by solvers 2024-12-17 14:08:47 -08:00
Paul Gauthier
c69ffe02f8 chore: Make problem_stats.py executable 2024-12-17 14:08:46 -08:00
Paul Gauthier (aider)
7bfc2e0e74 style: Run linter on benchmark script 2024-12-17 14:06:56 -08:00
Paul Gauthier (aider)
9cc674c283 feat: Add script to analyze exercise solution stats 2024-12-17 14:06:53 -08:00
Paul Gauthier
66e597a05c feat: Add problem stats benchmark 2024-12-17 14:06:52 -08:00
Paul Gauthier
4dc3b9072e feat: increase retry timeout for benchmarking 2024-12-11 14:26:28 -08:00
Paul Gauthier (aider)
fcb2bacd1e style: format benchmark.py with black 2024-12-11 13:09:52 -08:00
Paul Gauthier (aider)
a9401e921e feat: add sleep option between tests in single-threaded mode 2024-12-11 13:09:45 -08:00
Paul Gauthier (aider)
6af71951af style: fix whitespace in benchmark.py 2024-11-28 14:01:50 -08:00
Paul Gauthier (aider)
3eed45dc3e fix: improve benchmark directory selection based on latest .md file timestamp 2024-11-28 14:01:45 -08:00
Paul Gauthier (aider)
320b059bc7 perf: optimize benchmark dir search by filtering on timestamp first 2024-11-28 14:00:12 -08:00
Paul Gauthier
a89ce06377 fix: correct glob pattern for finding latest benchmark directory 2024-11-28 14:00:10 -08:00
Paul Gauthier (aider)
2ff3a23606 fix: add num_ctx parameter to run_test_real function 2024-11-25 19:21:08 -08:00
Paul Gauthier (aider)
c5ce57ea7f style: fix linting issues in benchmark.py 2024-11-25 19:20:49 -08:00
Paul Gauthier (aider)
351b8e50f0 feat: add --num-ctx flag to override model context window size 2024-11-25 19:20:43 -08:00
Paul Gauthier (aider)
6a0a97cb41 feat: Add host.docker.internal gateway to enable Ollama server access from container 2024-11-22 10:07:47 -08:00
Paul Gauthier (aider)
30ee89c7e9 style: Fix linting issues in over_time.py 2024-11-21 16:45:11 -08:00
Paul Gauthier (aider)
25bcea6aec feat: Add print of model release dates and names in sorted order 2024-11-21 16:45:07 -08:00
Paul Gauthier (aider)
8fdcd92260 feat: Update plot save paths to website assets directory 2024-11-21 14:19:05 -08:00
Paul Gauthier
781a40df52 fix: Update Gemini Pro legend label to Gemini 1.5 Pro 2024-11-21 14:19:03 -08:00
Paul Gauthier (aider)
a7fc0f9d2e feat: Add color and legend support for Gemini Pro models 2024-11-21 14:02:27 -08:00