copy

2025-06-06 04:35:00 +00:00 · 2024-11-26 14:12:16 -08:00 · 2024-11-26 14:12:16 -08:00 · ab3b50296c
commit ab3b50296c
parent 60c29b2839
1 changed files with 6 additions and 1 deletions
--- a/aider/website/_posts/2024-11-21-quantization.md
+++ b/aider/website/_posts/2024-11-21-quantization.md
@ -34,6 +34,8 @@ This benchmarking effort highlighted a number of pitfalls and details which
 can have a significant impact on the model's ability to correctly edit code:
 - **Quantization** -- Open source models are often available at dozens of different quantizations.
 Most seem to only modestly decrease code editing skill, but stronger quantizations
 do have a real impact.
 - **Context window** -- Cloud providers can decide how large a context window to accept,
 and they often choose differently. Ollama defaults to a tiny 2k context window,
 and silently discards data that exceeds it. Such a small window has
@ -43,9 +45,12 @@ differing output token limits. This has a direct impact on how much code the
 model can write or edit in a response.
 - **Buggy cloud providers** -- Between Qwen 2.5 Coder 32B Instruct
 and DeepSeek V2.5, there were
-multiple cloud providers with broken or buggy API endpoints that seemed
+multiple cloud providers with broken or buggy API endpoints.
 They seemed
 to be returning result different from expected based on the advertised
 quantization and context sizes.
 The harm caused to the code editing benchmark varied from serious
 to catastrophic.
 The best versions of the model rival GPT-4o, while the worst performing
 quantization is more like the older GPT-4 Turbo.