mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-10 14:45:00 +00:00
copy
This commit is contained in:
parent
7972f5f4bc
commit
60c29b2839
2 changed files with 41 additions and 14 deletions
|
@ -30,26 +30,29 @@ served both locally and from a variety of cloud providers.
|
|||
- Results from individual providers served via OpenRouter and directly to their own APIs.
|
||||
- Ollama locally serving different quantizations from the [Ollama model library](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M).
|
||||
|
||||
The best versions of the model rival GPT-4o, while the worst performer
|
||||
is more like the older GPT-4 Turbo.
|
||||
Suboptimal choices in quantization and token limits can
|
||||
easily produce far worse results.
|
||||
|
||||
This benchmarking effort highlighted a number of pitfalls and details which
|
||||
can have a significant impact on the model's ability to correctly edit code:
|
||||
|
||||
- Quantization -- Open source models are often available at dozens of different quantizations.
|
||||
- Context window -- Cloud providers can decide how large a context window to accept,
|
||||
- **Quantization** -- Open source models are often available at dozens of different quantizations.
|
||||
- **Context window** -- Cloud providers can decide how large a context window to accept,
|
||||
and they often choose differently. Ollama defaults to a tiny 2k context window,
|
||||
and silently discards data that exceeds it.
|
||||
- Output token limits -- Open source models are often served with wildly
|
||||
and silently discards data that exceeds it. Such a small window has
|
||||
catastrophic effects on performance.
|
||||
- **Output token limits** -- Open source models are often served with wildly
|
||||
differing output token limits. This has a direct impact on how much code the
|
||||
model can write or edit in a response.
|
||||
- Buggy cloud providers -- Between Qwen and DeepSeep, there were
|
||||
- **Buggy cloud providers** -- Between Qwen 2.5 Coder 32B Instruct
|
||||
and DeepSeek V2.5, there were
|
||||
multiple cloud providers with broken or buggy API endpoints that seemed
|
||||
to be returning result different from expected based on the advertised
|
||||
quantization and context sizes.
|
||||
|
||||
The best versions of the model rival GPT-4o, while the worst performing
|
||||
quantization is more like the older GPT-4 Turbo.
|
||||
Even an excellent fp16 quantization falls to GPT-3.5 Turbo levels of performance
|
||||
if run with Ollama's default 2k context window.
|
||||
|
||||
|
||||
|
||||
### Sections
|
||||
{: .no_toc }
|
||||
|
@ -134,9 +137,10 @@ a request that exceeds the context window.
|
|||
Instead, it just silently truncates the request by discarding the "oldest" messages
|
||||
in the chat to make it fit within the context window.
|
||||
|
||||
All of the Ollama results above were collected with at least an 8k context window, which
|
||||
is large enough to attempt all the coding problems in the benchmark.
|
||||
Aider sets Ollama's context window to 8k by default.
|
||||
Except for the single 2k context result,
|
||||
all of the Ollama results above were collected with at least an 8k context window.
|
||||
An 8k window is large enough to attempt all the coding problems in the benchmark.
|
||||
Aider sets Ollama's context window to 8k by default, starting in aider v0.65.0.
|
||||
|
||||
You can change the Ollama server's context window with a
|
||||
[`.aider.model.settings.yml` file](https://aider.chat/docs/config/adv-model-settings.html#model-settings)
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue