Paul Gauthier (aider)
|
674e3846e2
|
fix: Correctly sort leaderboard by pass rate
|
2024-12-17 14:13:43 -08:00 |
|
Paul Gauthier (aider)
|
3a0be0cca9
|
style: Apply linter formatting
|
2024-12-17 14:13:19 -08:00 |
|
Paul Gauthier (aider)
|
00d7c3a05a
|
feat: Add --topn argument to limit models by pass rate
|
2024-12-17 14:13:16 -08:00 |
|
Paul Gauthier (aider)
|
91f5fca5e9
|
feat: Include never solved exercises in stats
|
2024-12-17 14:10:47 -08:00 |
|
Paul Gauthier (aider)
|
1d7cb0c119
|
feat: Format problem stats output as a table with percentages
|
2024-12-17 14:10:00 -08:00 |
|
Paul Gauthier (aider)
|
24599aa64f
|
style: Run linter on problem_stats.py
|
2024-12-17 14:09:20 -08:00 |
|
Paul Gauthier (aider)
|
54c1553892
|
refactor: Remove distribution of solutions table
|
2024-12-17 14:09:17 -08:00 |
|
Paul Gauthier (aider)
|
0ae53ce1a1
|
feat: Output per-exercise stats, sort by solvers
|
2024-12-17 14:08:47 -08:00 |
|
Paul Gauthier
|
c69ffe02f8
|
chore: Make problem_stats.py executable
|
2024-12-17 14:08:46 -08:00 |
|
Paul Gauthier (aider)
|
7bfc2e0e74
|
style: Run linter on benchmark script
|
2024-12-17 14:06:56 -08:00 |
|
Paul Gauthier (aider)
|
9cc674c283
|
feat: Add script to analyze exercise solution stats
|
2024-12-17 14:06:53 -08:00 |
|
Paul Gauthier
|
66e597a05c
|
feat: Add problem stats benchmark
|
2024-12-17 14:06:52 -08:00 |
|
Paul Gauthier
|
4dc3b9072e
|
feat: increase retry timeout for benchmarking
|
2024-12-11 14:26:28 -08:00 |
|
Paul Gauthier (aider)
|
fcb2bacd1e
|
style: format benchmark.py with black
|
2024-12-11 13:09:52 -08:00 |
|
Paul Gauthier (aider)
|
a9401e921e
|
feat: add sleep option between tests in single-threaded mode
|
2024-12-11 13:09:45 -08:00 |
|
Paul Gauthier (aider)
|
6af71951af
|
style: fix whitespace in benchmark.py
|
2024-11-28 14:01:50 -08:00 |
|
Paul Gauthier (aider)
|
3eed45dc3e
|
fix: improve benchmark directory selection based on latest .md file timestamp
|
2024-11-28 14:01:45 -08:00 |
|
Paul Gauthier (aider)
|
320b059bc7
|
perf: optimize benchmark dir search by filtering on timestamp first
|
2024-11-28 14:00:12 -08:00 |
|
Paul Gauthier
|
a89ce06377
|
fix: correct glob pattern for finding latest benchmark directory
|
2024-11-28 14:00:10 -08:00 |
|
Paul Gauthier (aider)
|
2ff3a23606
|
fix: add num_ctx parameter to run_test_real function
|
2024-11-25 19:21:08 -08:00 |
|
Paul Gauthier (aider)
|
c5ce57ea7f
|
style: fix linting issues in benchmark.py
|
2024-11-25 19:20:49 -08:00 |
|
Paul Gauthier (aider)
|
351b8e50f0
|
feat: add --num-ctx flag to override model context window size
|
2024-11-25 19:20:43 -08:00 |
|
Paul Gauthier (aider)
|
6a0a97cb41
|
feat: Add host.docker.internal gateway to enable Ollama server access from container
|
2024-11-22 10:07:47 -08:00 |
|
Paul Gauthier (aider)
|
30ee89c7e9
|
style: Fix linting issues in over_time.py
|
2024-11-21 16:45:11 -08:00 |
|
Paul Gauthier (aider)
|
25bcea6aec
|
feat: Add print of model release dates and names in sorted order
|
2024-11-21 16:45:07 -08:00 |
|
Paul Gauthier (aider)
|
8fdcd92260
|
feat: Update plot save paths to website assets directory
|
2024-11-21 14:19:05 -08:00 |
|
Paul Gauthier
|
781a40df52
|
fix: Update Gemini Pro legend label to Gemini 1.5 Pro
|
2024-11-21 14:19:03 -08:00 |
|
Paul Gauthier (aider)
|
a7fc0f9d2e
|
feat: Add color and legend support for Gemini Pro models
|
2024-11-21 14:02:27 -08:00 |
|
Paul Gauthier (aider)
|
c189a52e5e
|
style: Organize imports and apply linter formatting
|
2024-11-21 14:00:24 -08:00 |
|
Paul Gauthier (aider)
|
6d6d763dd3
|
refactor: Restructure benchmark plotting script for improved maintainability
|
2024-11-21 14:00:20 -08:00 |
|
Paul Gauthier
|
1f0d26e8c7
|
better over time plot
|
2024-11-20 20:19:44 -08:00 |
|
Paul Gauthier
|
8302e9d0dd
|
improved over time plot
|
2024-11-20 20:16:35 -08:00 |
|
Paul Gauthier (aider)
|
c797af020a
|
refactor: Update fontsize to use LABEL_FONT_SIZE constant in over_time.py
|
2024-11-20 20:13:46 -08:00 |
|
Paul Gauthier (aider)
|
1c85afa320
|
feat: Add LABEL_FONT_SIZE constant for dot label font size
|
2024-11-20 20:13:33 -08:00 |
|
Paul Gauthier
|
eb5317f8e5
|
fix: Adjust annotation vertical offset for brown color in over_time plot
|
2024-11-20 20:13:30 -08:00 |
|
Paul Gauthier (aider)
|
8b860615b8
|
style: Increase font size for scatter plot dot labels
|
2024-11-20 20:10:40 -08:00 |
|
Paul Gauthier (aider)
|
c15ac341e2
|
refactor: Remove Opus and Llama model variants from legend labels
|
2024-11-20 20:07:52 -08:00 |
|
Paul Gauthier (aider)
|
c2c7ee1047
|
feat: Change Opus label to "Opus" in legend
|
2024-11-20 20:06:48 -08:00 |
|
Paul Gauthier (aider)
|
72c46ccec6
|
feat: Add labels for Claude 3 Opus, Sonnet, and O1 Preview models
|
2024-11-20 20:06:04 -08:00 |
|
Paul Gauthier (aider)
|
dd3bfaee01
|
style: Format code with consistent indentation and line breaks
|
2024-11-20 20:05:24 -08:00 |
|
Paul Gauthier (aider)
|
03206ad90e
|
feat: Add line labels directly on first points instead of using legend
|
2024-11-20 20:05:18 -08:00 |
|
Paul Gauthier (aider)
|
2e00307190
|
feat: Add color and legend label for o1-preview models
|
2024-11-20 20:03:49 -08:00 |
|
Paul Gauthier (aider)
|
b3e29ab20e
|
style: Apply linter formatting to benchmark code
|
2024-11-20 20:02:52 -08:00 |
|
Paul Gauthier (aider)
|
5504ac535b
|
feat: Add simplified model names for legend labels
|
2024-11-20 20:02:48 -08:00 |
|
Paul Gauthier (aider)
|
4b3dd7f4ea
|
style: Apply linter formatting to over_time.py
|
2024-11-20 19:59:43 -08:00 |
|
Paul Gauthier (aider)
|
8edf9540d5
|
feat: Add legend to plot and remove point labels
|
2024-11-20 19:59:38 -08:00 |
|
Paul Gauthier
|
1c62ecd1b5
|
style: Adjust x-axis label rotation angle for better readability
|
2024-11-20 19:59:36 -08:00 |
|
Paul Gauthier
|
7cf3d9f3ce
|
style: Increase annotation font size in benchmark plot
|
2024-11-20 19:45:42 -08:00 |
|
Paul Gauthier
|
9b5a703307
|
updated models-over-time
|
2024-11-20 19:40:59 -08:00 |
|
Paul Gauthier (aider)
|
370993cbed
|
style: Rotate point labels by 45 degrees in benchmark plot
|
2024-11-20 18:47:30 -08:00 |
|