mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-06 04:35:00 +00:00
copy
This commit is contained in:
parent
60c29b2839
commit
ab3b50296c
1 changed files with 6 additions and 1 deletions
|
@ -34,6 +34,8 @@ This benchmarking effort highlighted a number of pitfalls and details which
|
||||||
can have a significant impact on the model's ability to correctly edit code:
|
can have a significant impact on the model's ability to correctly edit code:
|
||||||
|
|
||||||
- **Quantization** -- Open source models are often available at dozens of different quantizations.
|
- **Quantization** -- Open source models are often available at dozens of different quantizations.
|
||||||
|
Most seem to only modestly decrease code editing skill, but stronger quantizations
|
||||||
|
do have a real impact.
|
||||||
- **Context window** -- Cloud providers can decide how large a context window to accept,
|
- **Context window** -- Cloud providers can decide how large a context window to accept,
|
||||||
and they often choose differently. Ollama defaults to a tiny 2k context window,
|
and they often choose differently. Ollama defaults to a tiny 2k context window,
|
||||||
and silently discards data that exceeds it. Such a small window has
|
and silently discards data that exceeds it. Such a small window has
|
||||||
|
@ -43,9 +45,12 @@ differing output token limits. This has a direct impact on how much code the
|
||||||
model can write or edit in a response.
|
model can write or edit in a response.
|
||||||
- **Buggy cloud providers** -- Between Qwen 2.5 Coder 32B Instruct
|
- **Buggy cloud providers** -- Between Qwen 2.5 Coder 32B Instruct
|
||||||
and DeepSeek V2.5, there were
|
and DeepSeek V2.5, there were
|
||||||
multiple cloud providers with broken or buggy API endpoints that seemed
|
multiple cloud providers with broken or buggy API endpoints.
|
||||||
|
They seemed
|
||||||
to be returning result different from expected based on the advertised
|
to be returning result different from expected based on the advertised
|
||||||
quantization and context sizes.
|
quantization and context sizes.
|
||||||
|
The harm caused to the code editing benchmark varied from serious
|
||||||
|
to catastrophic.
|
||||||
|
|
||||||
The best versions of the model rival GPT-4o, while the worst performing
|
The best versions of the model rival GPT-4o, while the worst performing
|
||||||
quantization is more like the older GPT-4 Turbo.
|
quantization is more like the older GPT-4 Turbo.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue