Commit graph

359 commits

Author SHA1 Message Date
Paul Gauthier (aider)
674e3846e2 fix: Correctly sort leaderboard by pass rate 2024-12-17 14:13:43 -08:00
Paul Gauthier (aider)
3a0be0cca9 style: Apply linter formatting 2024-12-17 14:13:19 -08:00
Paul Gauthier (aider)
00d7c3a05a feat: Add --topn argument to limit models by pass rate 2024-12-17 14:13:16 -08:00
Paul Gauthier (aider)
91f5fca5e9 feat: Include never solved exercises in stats 2024-12-17 14:10:47 -08:00
Paul Gauthier (aider)
1d7cb0c119 feat: Format problem stats output as a table with percentages 2024-12-17 14:10:00 -08:00
Paul Gauthier (aider)
24599aa64f style: Run linter on problem_stats.py 2024-12-17 14:09:20 -08:00
Paul Gauthier (aider)
54c1553892 refactor: Remove distribution of solutions table 2024-12-17 14:09:17 -08:00
Paul Gauthier (aider)
0ae53ce1a1 feat: Output per-exercise stats, sort by solvers 2024-12-17 14:08:47 -08:00
Paul Gauthier
c69ffe02f8 chore: Make problem_stats.py executable 2024-12-17 14:08:46 -08:00
Paul Gauthier (aider)
7bfc2e0e74 style: Run linter on benchmark script 2024-12-17 14:06:56 -08:00
Paul Gauthier (aider)
9cc674c283 feat: Add script to analyze exercise solution stats 2024-12-17 14:06:53 -08:00
Paul Gauthier
66e597a05c feat: Add problem stats benchmark 2024-12-17 14:06:52 -08:00
Paul Gauthier
4dc3b9072e feat: increase retry timeout for benchmarking 2024-12-11 14:26:28 -08:00
Paul Gauthier (aider)
fcb2bacd1e style: format benchmark.py with black 2024-12-11 13:09:52 -08:00
Paul Gauthier (aider)
a9401e921e feat: add sleep option between tests in single-threaded mode 2024-12-11 13:09:45 -08:00
Paul Gauthier (aider)
6af71951af style: fix whitespace in benchmark.py 2024-11-28 14:01:50 -08:00
Paul Gauthier (aider)
3eed45dc3e fix: improve benchmark directory selection based on latest .md file timestamp 2024-11-28 14:01:45 -08:00
Paul Gauthier (aider)
320b059bc7 perf: optimize benchmark dir search by filtering on timestamp first 2024-11-28 14:00:12 -08:00
Paul Gauthier
a89ce06377 fix: correct glob pattern for finding latest benchmark directory 2024-11-28 14:00:10 -08:00
Paul Gauthier (aider)
2ff3a23606 fix: add num_ctx parameter to run_test_real function 2024-11-25 19:21:08 -08:00
Paul Gauthier (aider)
c5ce57ea7f style: fix linting issues in benchmark.py 2024-11-25 19:20:49 -08:00
Paul Gauthier (aider)
351b8e50f0 feat: add --num-ctx flag to override model context window size 2024-11-25 19:20:43 -08:00
Paul Gauthier (aider)
6a0a97cb41 feat: Add host.docker.internal gateway to enable Ollama server access from container 2024-11-22 10:07:47 -08:00
Paul Gauthier (aider)
30ee89c7e9 style: Fix linting issues in over_time.py 2024-11-21 16:45:11 -08:00
Paul Gauthier (aider)
25bcea6aec feat: Add print of model release dates and names in sorted order 2024-11-21 16:45:07 -08:00
Paul Gauthier (aider)
8fdcd92260 feat: Update plot save paths to website assets directory 2024-11-21 14:19:05 -08:00
Paul Gauthier
781a40df52 fix: Update Gemini Pro legend label to Gemini 1.5 Pro 2024-11-21 14:19:03 -08:00
Paul Gauthier (aider)
a7fc0f9d2e feat: Add color and legend support for Gemini Pro models 2024-11-21 14:02:27 -08:00
Paul Gauthier (aider)
c189a52e5e style: Organize imports and apply linter formatting 2024-11-21 14:00:24 -08:00
Paul Gauthier (aider)
6d6d763dd3 refactor: Restructure benchmark plotting script for improved maintainability 2024-11-21 14:00:20 -08:00
Paul Gauthier
1f0d26e8c7 better over time plot 2024-11-20 20:19:44 -08:00
Paul Gauthier
8302e9d0dd improved over time plot 2024-11-20 20:16:35 -08:00
Paul Gauthier (aider)
c797af020a refactor: Update fontsize to use LABEL_FONT_SIZE constant in over_time.py 2024-11-20 20:13:46 -08:00
Paul Gauthier (aider)
1c85afa320 feat: Add LABEL_FONT_SIZE constant for dot label font size 2024-11-20 20:13:33 -08:00
Paul Gauthier
eb5317f8e5 fix: Adjust annotation vertical offset for brown color in over_time plot 2024-11-20 20:13:30 -08:00
Paul Gauthier (aider)
8b860615b8 style: Increase font size for scatter plot dot labels 2024-11-20 20:10:40 -08:00
Paul Gauthier (aider)
c15ac341e2 refactor: Remove Opus and Llama model variants from legend labels 2024-11-20 20:07:52 -08:00
Paul Gauthier (aider)
c2c7ee1047 feat: Change Opus label to "Opus" in legend 2024-11-20 20:06:48 -08:00
Paul Gauthier (aider)
72c46ccec6 feat: Add labels for Claude 3 Opus, Sonnet, and O1 Preview models 2024-11-20 20:06:04 -08:00
Paul Gauthier (aider)
dd3bfaee01 style: Format code with consistent indentation and line breaks 2024-11-20 20:05:24 -08:00
Paul Gauthier (aider)
03206ad90e feat: Add line labels directly on first points instead of using legend 2024-11-20 20:05:18 -08:00
Paul Gauthier (aider)
2e00307190 feat: Add color and legend label for o1-preview models 2024-11-20 20:03:49 -08:00
Paul Gauthier (aider)
b3e29ab20e style: Apply linter formatting to benchmark code 2024-11-20 20:02:52 -08:00
Paul Gauthier (aider)
5504ac535b feat: Add simplified model names for legend labels 2024-11-20 20:02:48 -08:00
Paul Gauthier (aider)
4b3dd7f4ea style: Apply linter formatting to over_time.py 2024-11-20 19:59:43 -08:00
Paul Gauthier (aider)
8edf9540d5 feat: Add legend to plot and remove point labels 2024-11-20 19:59:38 -08:00
Paul Gauthier
1c62ecd1b5 style: Adjust x-axis label rotation angle for better readability 2024-11-20 19:59:36 -08:00
Paul Gauthier
7cf3d9f3ce style: Increase annotation font size in benchmark plot 2024-11-20 19:45:42 -08:00
Paul Gauthier
9b5a703307 updated models-over-time 2024-11-20 19:40:59 -08:00
Paul Gauthier (aider)
370993cbed style: Rotate point labels by 45 degrees in benchmark plot 2024-11-20 18:47:30 -08:00