copy

2025-05-30 01:04:59 +00:00 · 2024-11-26 13:36:48 -08:00 · 2024-11-26 13:36:48 -08:00 · 60c29b2839
commit 60c29b2839
parent 7972f5f4bc
2 changed files with 41 additions and 14 deletions
--- a/aider/website/_data/quant.yml
+++ b/aider/website/_data/quant.yml
@ -296,4 +296,27 @@
  date: 2024-11-24
  versions: 0.64.2.dev
  seconds_per_case: 28.5
-  total_cost: 0.1390
+  total_cost: 0.1390
 - dirname: 2024-11-26-03-15-06--ollama-qwen2.5-coder:32b-instruct-fp16-2kctx
  test_cases: 132
  model: "Ollama: fp16, 2k ctx"
  edit_format: diff
  commit_hash: 68be6c5-dirty, 554d274, 2ff3a23, 2ff3a23-dirty, 61759f9, dd48b74, 3ebd47d-dirty
  pass_rate_1: 43.2
  pass_rate_2: 51.9
  percent_cases_well_formed: 46.2
  error_outputs: 171
  num_malformed_responses: 165
  num_with_malformed_responses: 71
  user_asks: 97
  lazy_comments: 2
  syntax_errors: 4
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 0
  command: "aider --model ollama/qwen2.5-coder:32b-instruct-fp16 # num_ctx: 2048"
  date: 2024-11-26
  versions: 0.64.2.dev,0.65.1.dev
  seconds_per_case: 188.6
  total_cost: 0.0000
--- a/aider/website/_posts/2024-11-21-quantization.md
+++ b/aider/website/_posts/2024-11-21-quantization.md
@ -30,26 +30,29 @@ served both locally and from a variety of cloud providers.
 - Results from individual providers served via OpenRouter and directly to their own APIs.
 - Ollama locally serving different quantizations from the [Ollama model library](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M).
 The best versions of the model rival GPT-4o, while the worst performer
 is more like the older GPT-4 Turbo.
 Suboptimal choices in quantization and token limits can
 easily produce far worse results.
 This benchmarking effort highlighted a number of pitfalls and details which
 can have a significant impact on the model's ability to correctly edit code:
- Quantization -- Open source models are often available at dozens of different quantizations.
+- **Quantization** -- Open source models are often available at dozens of different quantizations.
- Context window -- Cloud providers can decide how large a context window to accept,
+- **Context window** -- Cloud providers can decide how large a context window to accept,
 and they often choose differently. Ollama defaults to a tiny 2k context window,
-and silently discards data that exceeds it.
+and silently discards data that exceeds it. Such a small window has
- Output token limits -- Open source models are often served with wildly
+catastrophic effects on performance.
 - **Output token limits** -- Open source models are often served with wildly
 differing output token limits. This has a direct impact on how much code the
 model can write or edit in a response.
- Buggy cloud providers -- Between Qwen and DeepSeep, there were
+- **Buggy cloud providers** -- Between Qwen 2.5 Coder 32B Instruct
 and DeepSeek V2.5, there were
 multiple cloud providers with broken or buggy API endpoints that seemed
 to be returning result different from expected based on the advertised
 quantization and context sizes.
 The best versions of the model rival GPT-4o, while the worst performing
 quantization is more like the older GPT-4 Turbo.
 Even an excellent fp16 quantization falls to GPT-3.5 Turbo levels of performance
 if run with Ollama's default 2k context window.
 ### Sections
 {: .no_toc }
@ -134,9 +137,10 @@ a request that exceeds the context window.
 Instead, it just silently truncates the request by discarding the "oldest" messages
 in the chat to make it fit within the context window.
-All of the Ollama results above were collected with at least an 8k context window, which
+Except for the single 2k context result,
-is large enough to attempt all the coding problems in the benchmark.
+all of the Ollama results above were collected with at least an 8k context window.
-Aider sets Ollama's context window to 8k by default.
+An 8k window is large enough to attempt all the coding problems in the benchmark.
 Aider sets Ollama's context window to 8k by default, starting in aider v0.65.0.
 You can change the Ollama server's context window with a 
 [`.aider.model.settings.yml` file](https://aider.chat/docs/config/adv-model-settings.html#model-settings)