aider/aider/website/_posts/2024-11-21-quantization.md
Paul Gauthier c2f184f5bb copy
2024-11-24 06:21:58 -08:00

5.1 KiB

title excerpt highlight_image draft nav_exclude
Quantization matters Open source LLMs are becoming very powerful, but pay attention to how you (or your provider) is quantizing the model. It can affect code editing skill. /assets/quantization.jpg false true

{% if page.date %}

{{ page.date | date: "%B %d, %Y" }}

{% endif %}

Quantization matters

Open source models like Qwen 2.5 32B Instruct are performing very well on aider's code editing benchmark, rivaling closed source frontier models. But pay attention to how your model is being quantized, as it can impact code editing skill. Heavily quantized models are often used by cloud API providers and local model servers like Ollama or MLX.

The graph above compares different versions of the Qwen 2.5 Coder 32B Instruct model, served both locally and from cloud providers.

The best version of the model rivals GPT-4o, while the worst performer is more like GPT-4 Turbo level.

{: .note } This article is being updated as additional benchmark runs complete.

{% assign quant_sorted = site.data.quant | sort: 'pass_rate_2' | reverse %} {% for row in quant_sorted %} {% endfor %}
Model Percent completed correctly Percent using correct edit format Command Edit format
{{ row.model }} {{ row.pass_rate_2 }}% {{ row.percent_cases_well_formed }}% {{ row.command }} {{ row.edit_format }}

Setting Ollama's context window size

Ollama uses a 2k context window by default, which is very small for working with aider.

All of the Ollama results above were collected with at least an 8k context window, which is large enough to attempt all the coding problems in the benchmark.

You can set the Ollama server's context window with a .aider.model.settings.yml file like this:

- name: aider/extra_params
  extra_params:
    num_ctx: 8192

That uses the special model name aider/extra_params to set it for all models. You should probably use a specific model name like:

- name: ollama/qwen2.5-coder:32b-instruct-fp16
  extra_params:
    num_ctx: 8192

Choosing providers with OpenRouter

OpenRouter allows you to ignore specific providers in your preferences. This can be effective to exclude highly quantized or otherwise undesirable providers.

{: .note } Earlier versions of this article included incorrect Ollama models, and also included some Ollama results with the too small default 2k context window.