Paul Gauthier (aider)
|
77d379c021
|
refactor: Use full path for test names in benchmark
|
2024-12-17 17:43:52 -08:00 |
|
Paul Gauthier (aider)
|
1a12a59e91
|
chore: Remove comment about test_dnames
|
2024-12-17 17:41:29 -08:00 |
|
Paul Gauthier
|
0b970dd9c7
|
fix: Ensure test_dnames include full path
|
2024-12-17 17:41:27 -08:00 |
|
Paul Gauthier (aider)
|
93ac2bd53e
|
feat: Copy only practice subdirs with exercises
|
2024-12-17 17:36:03 -08:00 |
|
Paul Gauthier (aider)
|
f9646ac47a
|
chore: Remove comment about practice subdirs
|
2024-12-17 17:35:17 -08:00 |
|
Paul Gauthier
|
e8ed3b9e23
|
chore: Add comment about copying practice subdirs
|
2024-12-17 17:35:16 -08:00 |
|
Paul Gauthier (aider)
|
6238a07c8f
|
style: Run linter on benchmark.py
|
2024-12-17 17:33:28 -08:00 |
|
Paul Gauthier (aider)
|
1fb33f0c47
|
feat: Add language filter and multi-lang support
|
2024-12-17 17:33:23 -08:00 |
|
Paul Gauthier (aider)
|
a842f41627
|
style: Fix linting issues in benchmark.py
|
2024-12-17 16:49:50 -08:00 |
|
Paul Gauthier (aider)
|
c4c135e678
|
refactor: Use dict for test commands based on file extensions
|
2024-12-17 16:49:46 -08:00 |
|
Paul Gauthier (aider)
|
f36f2fdea2
|
style: Fix typo in test file extension check
|
2024-12-17 16:48:37 -08:00 |
|
Paul Gauthier (aider)
|
e3f0a67584
|
feat: Choose test command based on file extensions
|
2024-12-17 16:48:32 -08:00 |
|
Paul Gauthier
|
f6f05fa0c6
|
fix: Use cargo test for rust tests
|
2024-12-17 16:48:31 -08:00 |
|
Paul Gauthier (aider)
|
54ca7ceac8
|
feat: Use buildpack-deps, python3.12, and rust in Dockerfile
|
2024-12-17 16:39:30 -08:00 |
|
Paul Gauthier (aider)
|
cf5b38d4f5
|
style: Fix linting issues in benchmark.py
|
2024-12-17 16:35:20 -08:00 |
|
Paul Gauthier (aider)
|
b23669400f
|
fix: Correct syntax error in cleanup_test_output
|
2024-12-17 16:35:16 -08:00 |
|
Paul Gauthier
|
aaacd00ecf
|
refactor: Use pytest instead of unittest for running tests
|
2024-12-17 16:35:08 -08:00 |
|
Paul Gauthier (aider)
|
03aa22ba84
|
feat: Read config.json, copy solution/test files, no fallback
|
2024-12-17 16:18:10 -08:00 |
|
Paul Gauthier
|
1493b8703f
|
fix: Skip unparseable results files in real test
|
2024-12-17 16:18:09 -08:00 |
|
Paul Gauthier (aider)
|
59308c20c6
|
feat: Number exercises in the table
|
2024-12-17 14:15:40 -08:00 |
|
Paul Gauthier (aider)
|
cac5d8e716
|
style: Apply linter formatting
|
2024-12-17 14:15:06 -08:00 |
|
Paul Gauthier (aider)
|
7f16757bbe
|
fix: Handle missing results in topn leaderboard calculation
|
2024-12-17 14:15:02 -08:00 |
|
Paul Gauthier (aider)
|
674e3846e2
|
fix: Correctly sort leaderboard by pass rate
|
2024-12-17 14:13:43 -08:00 |
|
Paul Gauthier (aider)
|
3a0be0cca9
|
style: Apply linter formatting
|
2024-12-17 14:13:19 -08:00 |
|
Paul Gauthier (aider)
|
00d7c3a05a
|
feat: Add --topn argument to limit models by pass rate
|
2024-12-17 14:13:16 -08:00 |
|
Paul Gauthier (aider)
|
91f5fca5e9
|
feat: Include never solved exercises in stats
|
2024-12-17 14:10:47 -08:00 |
|
Paul Gauthier (aider)
|
1d7cb0c119
|
feat: Format problem stats output as a table with percentages
|
2024-12-17 14:10:00 -08:00 |
|
Paul Gauthier (aider)
|
24599aa64f
|
style: Run linter on problem_stats.py
|
2024-12-17 14:09:20 -08:00 |
|
Paul Gauthier (aider)
|
54c1553892
|
refactor: Remove distribution of solutions table
|
2024-12-17 14:09:17 -08:00 |
|
Paul Gauthier (aider)
|
0ae53ce1a1
|
feat: Output per-exercise stats, sort by solvers
|
2024-12-17 14:08:47 -08:00 |
|
Paul Gauthier
|
c69ffe02f8
|
chore: Make problem_stats.py executable
|
2024-12-17 14:08:46 -08:00 |
|
Paul Gauthier (aider)
|
7bfc2e0e74
|
style: Run linter on benchmark script
|
2024-12-17 14:06:56 -08:00 |
|
Paul Gauthier (aider)
|
9cc674c283
|
feat: Add script to analyze exercise solution stats
|
2024-12-17 14:06:53 -08:00 |
|
Paul Gauthier
|
66e597a05c
|
feat: Add problem stats benchmark
|
2024-12-17 14:06:52 -08:00 |
|
Paul Gauthier
|
4dc3b9072e
|
feat: increase retry timeout for benchmarking
|
2024-12-11 14:26:28 -08:00 |
|
Paul Gauthier (aider)
|
fcb2bacd1e
|
style: format benchmark.py with black
|
2024-12-11 13:09:52 -08:00 |
|
Paul Gauthier (aider)
|
a9401e921e
|
feat: add sleep option between tests in single-threaded mode
|
2024-12-11 13:09:45 -08:00 |
|
Paul Gauthier (aider)
|
6af71951af
|
style: fix whitespace in benchmark.py
|
2024-11-28 14:01:50 -08:00 |
|
Paul Gauthier (aider)
|
3eed45dc3e
|
fix: improve benchmark directory selection based on latest .md file timestamp
|
2024-11-28 14:01:45 -08:00 |
|
Paul Gauthier (aider)
|
320b059bc7
|
perf: optimize benchmark dir search by filtering on timestamp first
|
2024-11-28 14:00:12 -08:00 |
|
Paul Gauthier
|
a89ce06377
|
fix: correct glob pattern for finding latest benchmark directory
|
2024-11-28 14:00:10 -08:00 |
|
Paul Gauthier (aider)
|
2ff3a23606
|
fix: add num_ctx parameter to run_test_real function
|
2024-11-25 19:21:08 -08:00 |
|
Paul Gauthier (aider)
|
c5ce57ea7f
|
style: fix linting issues in benchmark.py
|
2024-11-25 19:20:49 -08:00 |
|
Paul Gauthier (aider)
|
351b8e50f0
|
feat: add --num-ctx flag to override model context window size
|
2024-11-25 19:20:43 -08:00 |
|
Paul Gauthier (aider)
|
6a0a97cb41
|
feat: Add host.docker.internal gateway to enable Ollama server access from container
|
2024-11-22 10:07:47 -08:00 |
|
Paul Gauthier (aider)
|
30ee89c7e9
|
style: Fix linting issues in over_time.py
|
2024-11-21 16:45:11 -08:00 |
|
Paul Gauthier (aider)
|
25bcea6aec
|
feat: Add print of model release dates and names in sorted order
|
2024-11-21 16:45:07 -08:00 |
|
Paul Gauthier (aider)
|
8fdcd92260
|
feat: Update plot save paths to website assets directory
|
2024-11-21 14:19:05 -08:00 |
|
Paul Gauthier
|
781a40df52
|
fix: Update Gemini Pro legend label to Gemini 1.5 Pro
|
2024-11-21 14:19:03 -08:00 |
|
Paul Gauthier (aider)
|
a7fc0f9d2e
|
feat: Add color and legend support for Gemini Pro models
|
2024-11-21 14:02:27 -08:00 |
|