fry69
667a58052e
feat: change edit format from "senior" to "architect"
2024-09-27 09:03:42 +02:00
fry69
e3e0d57512
chore: update parameter names in args and benchmark
2024-09-27 08:57:22 +02:00
Paul Gauthier
eb21cf2830
architect/editor
2024-09-26 16:10:19 -07:00
Paul Gauthier (aider)
5a78e7d1b8
chore: Run the linter
2024-09-26 11:35:13 -07:00
Paul Gauthier (aider)
1c05192b69
fix: Only record junior_model
and junior_edit_format
in the results array if edit_format
is "senior"
2024-09-26 11:35:09 -07:00
Paul Gauthier
e682eb8669
fix: Add junior model and junior edit format to benchmark results
2024-09-25 16:31:40 -07:00
Paul Gauthier (aider)
ed7503dbbe
feat: optimize find_latest_benchmark_dir to check only .md files and limit to one file per subtree
2024-09-25 12:20:45 -07:00
Paul Gauthier (aider)
e21cdafb15
style: run linter and fix code formatting issues
2024-09-25 12:18:43 -07:00
Paul Gauthier (aider)
8d90df1ebc
feat: implement automatic selection of the most recently updated benchmark directory when using --stats without dirnames
2024-09-25 12:18:39 -07:00
Paul Gauthier (aider)
24c959af2d
feat: Add --junior-model and --junior-edit-format flags to the benchmark
2024-09-25 11:44:34 -07:00
Paul Gauthier
15cc709322
feat: Improve senior coder's edit format handling
2024-09-25 11:42:09 -07:00
Paul Gauthier
65e57df7ea
feat: Implement changes to handle files content in Coder and prompts
2024-09-25 09:54:16 -07:00
Paul Gauthier
075bc828f6
stand alone junior message
2024-09-25 08:41:49 -07:00
Paul Gauthier
c912982747
senior-junior
2024-09-25 08:25:11 -07:00
Paul Gauthier
a9e9f9cdbe
Merge branch 'main' into ask-plan-simple
2024-09-25 07:46:15 -07:00
Paul Gauthier
412b8e7c3c
copy
2024-09-21 10:09:26 -07:00
Paul Gauthier
2753ac6b62
feat: Add new benchmark test case for qwen-2.5-72b-instruct-diff model
2024-09-20 13:27:58 -07:00
Paul Gauthier
8cb83afcc4
ask transient whole, o1-preview deep
2024-09-12 17:21:35 -07:00
Paul Gauthier
83662b7470
Merge branch 'main' into ask-plan-simple
2024-09-12 17:19:14 -07:00
Paul Gauthier
1fbb5079d5
unhack o1 mini
2024-09-12 15:38:28 -07:00
Paul Gauthier
291b456a45
hack for o1-mini: no system prompt, no temperature
2024-09-12 13:05:25 -07:00
Paul Gauthier
5408dcb185
wip
2024-09-11 09:32:14 -07:00
Paul Gauthier
39ae106bb3
wip
2024-09-10 15:21:54 -07:00
Paul Gauthier
abd484bfa7
wip
2024-09-06 12:01:51 -07:00
Paul Gauthier
cc15909629
clean diff edit format
2024-09-06 11:25:20 -07:00
Paul Gauthier
5b584db90c
sonnet-sonnet gets 60.2/84.2
2024-09-06 09:49:01 -07:00
Paul Gauthier
1c73e7d43a
turn off suggest shell commands during benchmarks
2024-09-05 14:35:34 -07:00
Paul Gauthier
05dcbeecac
noop
2024-09-05 14:25:09 -07:00
Paul Gauthier
ff3a75413b
sonnet+deep got 60.9/82.0
2024-09-05 13:30:25 -07:00
Paul Gauthier
1a3d8c4015
wip
2024-08-20 17:45:40 -07:00
Paul Gauthier
821eae16ae
copy
2024-08-19 20:54:10 -07:00
Paul Gauthier
e0a9044118
copy
2024-08-19 20:53:42 -07:00
Paul Gauthier
730d6e0e94
copy
2024-08-19 20:51:03 -07:00
Paul Gauthier
86a7a17d47
copy
2024-08-19 20:47:52 -07:00
Paul Gauthier
2944445340
copy
2024-08-19 20:44:48 -07:00
Paul Gauthier
b61b5f4b74
cleanup before merge
2024-08-16 11:35:30 -07:00
Paul Gauthier
3a2ac02024
Merge branch 'main' into json-coders
2024-08-15 12:15:07 -07:00
Paul Gauthier
822a8ab671
remove gpt-4o-mini from the gpt-4 trendline
2024-08-15 09:52:21 -07:00
Paul Gauthier (aider)
5ccdebf2c0
refactor: Extract color assignment logic into a separate function
2024-08-15 09:50:50 -07:00
Paul Gauthier
bac04a2a3d
no lint
2024-08-15 06:10:46 -07:00
Paul Gauthier (aider)
0a3c6bfbe7
feat: Change blue color to light blue in plot_over_time function
2024-08-14 06:29:48 -07:00
Paul Gauthier (aider)
d2b4846b95
feat: Replace orange color with purple for "-4o" models
2024-08-14 06:29:13 -07:00
Paul Gauthier (aider)
fb0b348bec
fix: Remove unused blue_points
variable
2024-08-14 06:28:28 -07:00
Paul Gauthier (aider)
a7290be843
style: Apply linter formatting changes
2024-08-14 06:27:51 -07:00
Paul Gauthier (aider)
1cdbc76974
feat: Connect model family lines in over_time plot
2024-08-14 06:27:48 -07:00
Paul Gauthier
714fd45f4d
fix: Update color logic and font size in over_time.py
2024-08-14 06:27:47 -07:00
Paul Gauthier (aider)
1f6cadcc66
style: Refactor conditional logic in color assignment
2024-08-14 06:22:51 -07:00
Paul Gauthier (aider)
c4f70d81b7
feat: add new color for all "-4o-" models except "gpt-4o-mini"
2024-08-14 06:22:48 -07:00
Paul Gauthier (aider)
1f59687e9d
style: Format code with linter
2024-08-14 06:21:48 -07:00
Paul Gauthier (aider)
d8c8c51156
The commit message for these changes would be:
...
feat: Improve graph visualization and add debugging
The changes made in this commit include:
1. Adjusting the y-axis limit to 100 to accommodate the higher pass rate values.
2. Rotating the x-axis labels for better readability.
3. Adding debug print statements to track the progress of figure generation and display.
4. Increasing the figure size for better visibility.
5. Adding additional debugging to ensure the data is being plotted correctly.
These improvements should help with the visualization and debugging of the graph generation process.
2024-08-14 06:21:45 -07:00