mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-02 18:54:59 +00:00
copy
This commit is contained in:
parent
53ca83beea
commit
485bfa2492
2 changed files with 29 additions and 1 deletions
|
@ -465,3 +465,28 @@
|
||||||
versions: 0.44.1-dev
|
versions: 0.44.1-dev
|
||||||
seconds_per_case: 7.8
|
seconds_per_case: 7.8
|
||||||
total_cost: 0.0916
|
total_cost: 0.0916
|
||||||
|
|
||||||
|
- dirname: 2024-09-29-22-35-36--architect-o1preview-o1mini-whole
|
||||||
|
test_cases: 133
|
||||||
|
model: o1-preview
|
||||||
|
edit_format: architect
|
||||||
|
commit_hash: 53ca83b
|
||||||
|
editor_model: o1-mini
|
||||||
|
editor_edit_format: whole
|
||||||
|
pass_rate_1: 65.4
|
||||||
|
pass_rate_2: 85.0
|
||||||
|
percent_cases_well_formed: 100.0
|
||||||
|
error_outputs: 0
|
||||||
|
num_malformed_responses: 0
|
||||||
|
num_with_malformed_responses: 0
|
||||||
|
user_asks: 179
|
||||||
|
lazy_comments: 4
|
||||||
|
syntax_errors: 0
|
||||||
|
indentation_errors: 0
|
||||||
|
exhausted_context_windows: 0
|
||||||
|
test_timeouts: 1
|
||||||
|
command: aider --model o1-preview
|
||||||
|
date: 2024-09-29
|
||||||
|
versions: 0.58.1.dev
|
||||||
|
seconds_per_case: 39.7
|
||||||
|
total_cost: 36.2078
|
|
@ -19,7 +19,10 @@ Aider now has experimental support for using two models to complete each coding
|
||||||
Splitting up "code reasoning" and "code editing" in this manner
|
Splitting up "code reasoning" and "code editing" in this manner
|
||||||
has produced SOTA results on
|
has produced SOTA results on
|
||||||
[aider's code editing benchmark](/docs/benchmarks.html#the-benchmark).
|
[aider's code editing benchmark](/docs/benchmarks.html#the-benchmark).
|
||||||
It also significantly improved the benchmark scores of many
|
Using o1-preview as the Architect with either DeepSeek or o1-mini as the
|
||||||
|
Editor produced the SOTA score of 85%.
|
||||||
|
Using the Architect/Editor approach
|
||||||
|
also significantly improved the benchmark scores of many
|
||||||
models, compared to their previous "solo" baseline scores (striped bars).
|
models, compared to their previous "solo" baseline scores (striped bars).
|
||||||
|
|
||||||
<style>
|
<style>
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue