mirror of
https://github.com/Aider-AI/aider.git
synced 2025-05-28 16:25:00 +00:00
copy
This commit is contained in:
parent
7972f5f4bc
commit
60c29b2839
2 changed files with 41 additions and 14 deletions
|
@ -296,4 +296,27 @@
|
|||
date: 2024-11-24
|
||||
versions: 0.64.2.dev
|
||||
seconds_per_case: 28.5
|
||||
total_cost: 0.1390
|
||||
total_cost: 0.1390
|
||||
|
||||
- dirname: 2024-11-26-03-15-06--ollama-qwen2.5-coder:32b-instruct-fp16-2kctx
|
||||
test_cases: 132
|
||||
model: "Ollama: fp16, 2k ctx"
|
||||
edit_format: diff
|
||||
commit_hash: 68be6c5-dirty, 554d274, 2ff3a23, 2ff3a23-dirty, 61759f9, dd48b74, 3ebd47d-dirty
|
||||
pass_rate_1: 43.2
|
||||
pass_rate_2: 51.9
|
||||
percent_cases_well_formed: 46.2
|
||||
error_outputs: 171
|
||||
num_malformed_responses: 165
|
||||
num_with_malformed_responses: 71
|
||||
user_asks: 97
|
||||
lazy_comments: 2
|
||||
syntax_errors: 4
|
||||
indentation_errors: 0
|
||||
exhausted_context_windows: 0
|
||||
test_timeouts: 0
|
||||
command: "aider --model ollama/qwen2.5-coder:32b-instruct-fp16 # num_ctx: 2048"
|
||||
date: 2024-11-26
|
||||
versions: 0.64.2.dev,0.65.1.dev
|
||||
seconds_per_case: 188.6
|
||||
total_cost: 0.0000
|
|
@ -30,26 +30,29 @@ served both locally and from a variety of cloud providers.
|
|||
- Results from individual providers served via OpenRouter and directly to their own APIs.
|
||||
- Ollama locally serving different quantizations from the [Ollama model library](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M).
|
||||
|
||||
The best versions of the model rival GPT-4o, while the worst performer
|
||||
is more like the older GPT-4 Turbo.
|
||||
Suboptimal choices in quantization and token limits can
|
||||
easily produce far worse results.
|
||||
|
||||
This benchmarking effort highlighted a number of pitfalls and details which
|
||||
can have a significant impact on the model's ability to correctly edit code:
|
||||
|
||||
- Quantization -- Open source models are often available at dozens of different quantizations.
|
||||
- Context window -- Cloud providers can decide how large a context window to accept,
|
||||
- **Quantization** -- Open source models are often available at dozens of different quantizations.
|
||||
- **Context window** -- Cloud providers can decide how large a context window to accept,
|
||||
and they often choose differently. Ollama defaults to a tiny 2k context window,
|
||||
and silently discards data that exceeds it.
|
||||
- Output token limits -- Open source models are often served with wildly
|
||||
and silently discards data that exceeds it. Such a small window has
|
||||
catastrophic effects on performance.
|
||||
- **Output token limits** -- Open source models are often served with wildly
|
||||
differing output token limits. This has a direct impact on how much code the
|
||||
model can write or edit in a response.
|
||||
- Buggy cloud providers -- Between Qwen and DeepSeep, there were
|
||||
- **Buggy cloud providers** -- Between Qwen 2.5 Coder 32B Instruct
|
||||
and DeepSeek V2.5, there were
|
||||
multiple cloud providers with broken or buggy API endpoints that seemed
|
||||
to be returning result different from expected based on the advertised
|
||||
quantization and context sizes.
|
||||
|
||||
The best versions of the model rival GPT-4o, while the worst performing
|
||||
quantization is more like the older GPT-4 Turbo.
|
||||
Even an excellent fp16 quantization falls to GPT-3.5 Turbo levels of performance
|
||||
if run with Ollama's default 2k context window.
|
||||
|
||||
|
||||
|
||||
### Sections
|
||||
{: .no_toc }
|
||||
|
@ -134,9 +137,10 @@ a request that exceeds the context window.
|
|||
Instead, it just silently truncates the request by discarding the "oldest" messages
|
||||
in the chat to make it fit within the context window.
|
||||
|
||||
All of the Ollama results above were collected with at least an 8k context window, which
|
||||
is large enough to attempt all the coding problems in the benchmark.
|
||||
Aider sets Ollama's context window to 8k by default.
|
||||
Except for the single 2k context result,
|
||||
all of the Ollama results above were collected with at least an 8k context window.
|
||||
An 8k window is large enough to attempt all the coding problems in the benchmark.
|
||||
Aider sets Ollama's context window to 8k by default, starting in aider v0.65.0.
|
||||
|
||||
You can change the Ollama server's context window with a
|
||||
[`.aider.model.settings.yml` file](https://aider.chat/docs/config/adv-model-settings.html#model-settings)
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue