mirror of https://github.com/Aider-AI/aider.git synced 2025-06-03 11:14:59 +00:00

Paul Gauthier f9126416e8 copy

2024-11-22 16:38:02 -08:00

5 KiB

Raw Blame History

title	excerpt	highlight_image	draft	nav_exclude
Quantization matters	Open source LLMs are becoming very powerful, but pay attention to how you (or your provider) is quantizing the model. It can strongly affect code editing skill.	/assets/quantization.jpg	false	true

{% if page.date %}

{% endif %}

Quantization matters

Open source models like Qwen 2.5 32B Instruct are performing very well on aider's code editing benchmark, rivaling closed source frontier models. But pay attention to how your model is being quantized, as it can strongly impact code editing skill. Heavily quantized models are often used by cloud API providers and local model servers like Ollama.

The graph above compares 4 different versions of the Qwen 2.5 Coder 32B Instruct model, served both locally and from cloud providers.

The HuggingFace BF16 weights served via glhf.chat.
Hyperbolic labs API for qwen2-5-coder-32b-instruct, which is using BF16. This result is probably within the expected variance of the HF result.
The results from OpenRouter's mix of providers which serve the model with different levels of quantization.
Ollama locally serving qwen2.5-coder:32b-instruct-q4_K_M), which has Q4_K_M quantization, with Ollama's default 2k context window.

The best version of the model rivals GPT-4o, while the worst performer is more like GPT-3.5 Turbo level.

{: .note } This article is being updated as additional benchmark runs complete. The original version included incorrect Ollama models.

{% assign quant_sorted = site.data.quant | sort: 'pass_rate_2' | reverse %} {% for row in quant_sorted %} {% endfor %}

Model	Percent completed correctly	Percent using correct edit format	Command	Edit format
{{ row.model }}	{{ row.pass_rate_2 }}%	{{ row.percent_cases_well_formed }}%	`{{ row.command }}`	{{ row.edit_format }}

Setting the context window size

Ollama uses a 2k context window by default, which is very small for working with aider.

You can set the Ollama server's context window with a .aider.model.settings.yml file like this:

- name: aider/extra_params
  extra_params:
    num_ctx: 65536

That uses the special model name aider/extra_params to set it for all models. You should probably use a specific model name like:

- name: ollama/qwen2.5-coder:32b-instruct-fp16
  extra_params:
    num_ctx: 65536

Choosing providers with OpenRouter

OpenRouter allows you to ignore specific providers in your preferences. This can be effective to exclude highly quantized or otherwise undesirable providers.

5 KiB Raw Blame History

Quantization matters

Setting the context window size

Choosing providers with OpenRouter

5 KiB

Raw Blame History