Commit graph

298 commits

Author SHA1 Message Date
fry69
667a58052e feat: change edit format from "senior" to "architect" 2024-09-27 09:03:42 +02:00
fry69
e3e0d57512 chore: update parameter names in args and benchmark 2024-09-27 08:57:22 +02:00
Paul Gauthier
eb21cf2830 architect/editor 2024-09-26 16:10:19 -07:00
Paul Gauthier (aider)
5a78e7d1b8 chore: Run the linter 2024-09-26 11:35:13 -07:00
Paul Gauthier (aider)
1c05192b69 fix: Only record junior_model and junior_edit_format in the results array if edit_format is "senior" 2024-09-26 11:35:09 -07:00
Paul Gauthier
e682eb8669 fix: Add junior model and junior edit format to benchmark results 2024-09-25 16:31:40 -07:00
Paul Gauthier (aider)
ed7503dbbe feat: optimize find_latest_benchmark_dir to check only .md files and limit to one file per subtree 2024-09-25 12:20:45 -07:00
Paul Gauthier (aider)
e21cdafb15 style: run linter and fix code formatting issues 2024-09-25 12:18:43 -07:00
Paul Gauthier (aider)
8d90df1ebc feat: implement automatic selection of the most recently updated benchmark directory when using --stats without dirnames 2024-09-25 12:18:39 -07:00
Paul Gauthier (aider)
24c959af2d feat: Add --junior-model and --junior-edit-format flags to the benchmark 2024-09-25 11:44:34 -07:00
Paul Gauthier
15cc709322 feat: Improve senior coder's edit format handling 2024-09-25 11:42:09 -07:00
Paul Gauthier
65e57df7ea feat: Implement changes to handle files content in Coder and prompts 2024-09-25 09:54:16 -07:00
Paul Gauthier
075bc828f6 stand alone junior message 2024-09-25 08:41:49 -07:00
Paul Gauthier
c912982747 senior-junior 2024-09-25 08:25:11 -07:00
Paul Gauthier
a9e9f9cdbe Merge branch 'main' into ask-plan-simple 2024-09-25 07:46:15 -07:00
Paul Gauthier
412b8e7c3c copy 2024-09-21 10:09:26 -07:00
Paul Gauthier
2753ac6b62 feat: Add new benchmark test case for qwen-2.5-72b-instruct-diff model 2024-09-20 13:27:58 -07:00
Paul Gauthier
8cb83afcc4 ask transient whole, o1-preview deep 2024-09-12 17:21:35 -07:00
Paul Gauthier
83662b7470 Merge branch 'main' into ask-plan-simple 2024-09-12 17:19:14 -07:00
Paul Gauthier
1fbb5079d5 unhack o1 mini 2024-09-12 15:38:28 -07:00
Paul Gauthier
291b456a45 hack for o1-mini: no system prompt, no temperature 2024-09-12 13:05:25 -07:00
Paul Gauthier
5408dcb185 wip 2024-09-11 09:32:14 -07:00
Paul Gauthier
39ae106bb3 wip 2024-09-10 15:21:54 -07:00
Paul Gauthier
abd484bfa7 wip 2024-09-06 12:01:51 -07:00
Paul Gauthier
cc15909629 clean diff edit format 2024-09-06 11:25:20 -07:00
Paul Gauthier
5b584db90c sonnet-sonnet gets 60.2/84.2 2024-09-06 09:49:01 -07:00
Paul Gauthier
1c73e7d43a turn off suggest shell commands during benchmarks 2024-09-05 14:35:34 -07:00
Paul Gauthier
05dcbeecac noop 2024-09-05 14:25:09 -07:00
Paul Gauthier
ff3a75413b sonnet+deep got 60.9/82.0 2024-09-05 13:30:25 -07:00
Paul Gauthier
1a3d8c4015 wip 2024-08-20 17:45:40 -07:00
Paul Gauthier
821eae16ae copy 2024-08-19 20:54:10 -07:00
Paul Gauthier
e0a9044118 copy 2024-08-19 20:53:42 -07:00
Paul Gauthier
730d6e0e94 copy 2024-08-19 20:51:03 -07:00
Paul Gauthier
86a7a17d47 copy 2024-08-19 20:47:52 -07:00
Paul Gauthier
2944445340 copy 2024-08-19 20:44:48 -07:00
Paul Gauthier
b61b5f4b74 cleanup before merge 2024-08-16 11:35:30 -07:00
Paul Gauthier
3a2ac02024 Merge branch 'main' into json-coders 2024-08-15 12:15:07 -07:00
Paul Gauthier
822a8ab671 remove gpt-4o-mini from the gpt-4 trendline 2024-08-15 09:52:21 -07:00
Paul Gauthier (aider)
5ccdebf2c0 refactor: Extract color assignment logic into a separate function 2024-08-15 09:50:50 -07:00
Paul Gauthier
bac04a2a3d no lint 2024-08-15 06:10:46 -07:00
Paul Gauthier (aider)
0a3c6bfbe7 feat: Change blue color to light blue in plot_over_time function 2024-08-14 06:29:48 -07:00
Paul Gauthier (aider)
d2b4846b95 feat: Replace orange color with purple for "-4o" models 2024-08-14 06:29:13 -07:00
Paul Gauthier (aider)
fb0b348bec fix: Remove unused blue_points variable 2024-08-14 06:28:28 -07:00
Paul Gauthier (aider)
a7290be843 style: Apply linter formatting changes 2024-08-14 06:27:51 -07:00
Paul Gauthier (aider)
1cdbc76974 feat: Connect model family lines in over_time plot 2024-08-14 06:27:48 -07:00
Paul Gauthier
714fd45f4d fix: Update color logic and font size in over_time.py 2024-08-14 06:27:47 -07:00
Paul Gauthier (aider)
1f6cadcc66 style: Refactor conditional logic in color assignment 2024-08-14 06:22:51 -07:00
Paul Gauthier (aider)
c4f70d81b7 feat: add new color for all "-4o-" models except "gpt-4o-mini" 2024-08-14 06:22:48 -07:00
Paul Gauthier (aider)
1f59687e9d style: Format code with linter 2024-08-14 06:21:48 -07:00
Paul Gauthier (aider)
d8c8c51156 The commit message for these changes would be:
feat: Improve graph visualization and add debugging

The changes made in this commit include:

1. Adjusting the y-axis limit to 100 to accommodate the higher pass rate values.
2. Rotating the x-axis labels for better readability.
3. Adding debug print statements to track the progress of figure generation and display.
4. Increasing the figure size for better visibility.
5. Adding additional debugging to ensure the data is being plotted correctly.

These improvements should help with the visualization and debugging of the graph generation process.
2024-08-14 06:21:45 -07:00