From f9126416e84fbd6120ee170bf8e0f07d7a9eade1 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Fri, 22 Nov 2024 16:38:02 -0800 Subject: [PATCH] copy --- .../website/_posts/2024-11-21-quantization.md | 24 ++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/aider/website/_posts/2024-11-21-quantization.md b/aider/website/_posts/2024-11-21-quantization.md index c247af6de..3e4ba910c 100644 --- a/aider/website/_posts/2024-11-21-quantization.md +++ b/aider/website/_posts/2024-11-21-quantization.md @@ -30,7 +30,7 @@ served both locally and from cloud providers. - The [HuggingFace BF16 weights](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) served via [glhf.chat](https://glhf.chat). - Hyperbolic labs API for [qwen2-5-coder-32b-instruct](https://app.hyperbolic.xyz/models/qwen2-5-coder-32b-instruct), which is using BF16. This result is probably within the expected variance of the HF result. - The results from [OpenRouter's mix of providers](https://openrouter.ai/qwen/qwen-2.5-coder-32b-instruct/providers) which serve the model with different levels of quantization. -- Ollama locally serving [qwen2.5-coder:32b-instruct-q4_K_M)](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M), which has `Q4_K_M` quantization. +- Ollama locally serving [qwen2.5-coder:32b-instruct-q4_K_M)](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M), which has `Q4_K_M` quantization, with Ollama's default 2k context window. The best version of the model rivals GPT-4o, while the worst performer is more like GPT-3.5 Turbo level. @@ -99,6 +99,28 @@ document.getElementById('quantSearchInput').addEventListener('keyup', function() }); +## Setting the context window size + +[Ollama uses a 2k context window by default](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size), +which is very small for working with aider. + +You can set the Ollama server's context window with a +[`.aider.model.settings.yml` file](https://aider.chat/docs/config/adv-model-settings.html#model-settings) +like this: + +``` +- name: aider/extra_params + extra_params: + num_ctx: 65536 +``` + +That uses the special model name `aider/extra_params` to set it for *all* models. You should probably use a specific model name like: + +``` +- name: ollama/qwen2.5-coder:32b-instruct-fp16 + extra_params: + num_ctx: 65536 +``` ## Choosing providers with OpenRouter