This commit is contained in:
Paul Gauthier 2024-09-29 15:46:32 -07:00
parent 53ca83beea
commit 485bfa2492
2 changed files with 29 additions and 1 deletions

View file

@ -465,3 +465,28 @@
versions: 0.44.1-dev versions: 0.44.1-dev
seconds_per_case: 7.8 seconds_per_case: 7.8
total_cost: 0.0916 total_cost: 0.0916
- dirname: 2024-09-29-22-35-36--architect-o1preview-o1mini-whole
test_cases: 133
model: o1-preview
edit_format: architect
commit_hash: 53ca83b
editor_model: o1-mini
editor_edit_format: whole
pass_rate_1: 65.4
pass_rate_2: 85.0
percent_cases_well_formed: 100.0
error_outputs: 0
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 179
lazy_comments: 4
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 1
command: aider --model o1-preview
date: 2024-09-29
versions: 0.58.1.dev
seconds_per_case: 39.7
total_cost: 36.2078

View file

@ -19,7 +19,10 @@ Aider now has experimental support for using two models to complete each coding
Splitting up "code reasoning" and "code editing" in this manner Splitting up "code reasoning" and "code editing" in this manner
has produced SOTA results on has produced SOTA results on
[aider's code editing benchmark](/docs/benchmarks.html#the-benchmark). [aider's code editing benchmark](/docs/benchmarks.html#the-benchmark).
It also significantly improved the benchmark scores of many Using o1-preview as the Architect with either DeepSeek or o1-mini as the
Editor produced the SOTA score of 85%.
Using the Architect/Editor approach
also significantly improved the benchmark scores of many
models, compared to their previous "solo" baseline scores (striped bars). models, compared to their previous "solo" baseline scores (striped bars).
<style> <style>