mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-29 08:44:59 +00:00
copy
This commit is contained in:
parent
2944445340
commit
86a7a17d47
1 changed files with 21 additions and 18 deletions
|
@ -101,27 +101,29 @@ collecting stats not executing unsafe python.
|
||||||
The benchmark report is a yaml record with statistics about the run:
|
The benchmark report is a yaml record with statistics about the run:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
- dirname: 2024-08-15-13-26-38--json-no-lint-deepseek-coder-whole
|
- dirname: 2024-07-04-14-32-08--claude-3.5-sonnet-diff-continue
|
||||||
test_cases: 133
|
test_cases: 133
|
||||||
model: deepseek-coder V2 0724
|
model: claude-3.5-sonnet
|
||||||
edit_format: Markdown
|
edit_format: diff
|
||||||
commit_hash: bac04a2
|
commit_hash: 35f21b5
|
||||||
pass_rate_1: 59.4
|
pass_rate_1: 57.1
|
||||||
percent_cases_well_formed: 100.0
|
pass_rate_2: 77.4
|
||||||
error_outputs: 2
|
percent_cases_well_formed: 99.2
|
||||||
num_malformed_responses: 0
|
error_outputs: 23
|
||||||
num_with_malformed_responses: 0
|
released: 2024-06-20
|
||||||
|
num_malformed_responses: 4
|
||||||
|
num_with_malformed_responses: 1
|
||||||
user_asks: 2
|
user_asks: 2
|
||||||
lazy_comments: 0
|
lazy_comments: 0
|
||||||
syntax_errors: 0
|
syntax_errors: 1
|
||||||
indentation_errors: 0
|
indentation_errors: 0
|
||||||
exhausted_context_windows: 0
|
exhausted_context_windows: 0
|
||||||
test_timeouts: 0
|
test_timeouts: 1
|
||||||
command: aider --model deepseek-coder
|
command: aider --sonnet
|
||||||
date: 2024-08-15
|
date: 2024-07-04
|
||||||
versions: 0.50.2-dev
|
versions: 0.42.1-dev
|
||||||
seconds_per_case: 27.9
|
seconds_per_case: 17.6
|
||||||
total_cost: 0.0438
|
total_cost: 3.6346
|
||||||
```
|
```
|
||||||
|
|
||||||
The key statistics are the `pass_rate_#` entries, which report the
|
The key statistics are the `pass_rate_#` entries, which report the
|
||||||
|
@ -129,8 +131,9 @@ percent of the tasks which had all tests passing.
|
||||||
There will be multiple of these pass rate stats,
|
There will be multiple of these pass rate stats,
|
||||||
depending on the value of the `--tries` parameter.
|
depending on the value of the `--tries` parameter.
|
||||||
|
|
||||||
The yaml also includes all the settings which were in effect for the benchmark and
|
The yaml also includes all the settings which were in effect for the benchmark run and
|
||||||
the git hash of the repo. The `model`, `edit_format` and `commit_hash`
|
the git hash of the repo used to run it.
|
||||||
|
The `model`, `edit_format` and `commit_hash`
|
||||||
should be enough to reliably reproduce any benchmark run.
|
should be enough to reliably reproduce any benchmark run.
|
||||||
|
|
||||||
You can see examples of the benchmark report yaml in the
|
You can see examples of the benchmark report yaml in the
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue