mirror of https://github.com/Aider-AI/aider.git synced 2025-06-03 19:24:59 +00:00

Paul Gauthier 8448eff1eb copy

2024-11-21 11:38:41 -08:00

1.6 KiB

Raw Blame History

title	excerpt	highlight_image	draft	nav_exclude
Quantization matters	Open source LLMs are becoming very powerful, but pay attention to how you (or your) provider is quantizing the model. It strongly affects code editing skill.	/assets/quantization.jpg	false	true

{% if page.date %}

{% endif %}

Quantization matters

Open source models like Qwen 2.5 32B are performing very well on aider's code editing benchmark, rivaling closed source frontier models. But pay attention to how your model is being quantized, as it can strongly impact code editing skill. Heavily quantized models are often used by cloud API providers and local model servers like Ollama.

The graph above compares 4 different versions of the Qwen 2.5 32B model, served both locally and from cloud providers.

The HuggingFace weights served via glhf.chat.
The results from OpenRouter's mix of providers.
Two Ollama models run locally.

The best version of the model rivals GPT-4o, while the worst performer is more like GPT-3.5 Turbo.

Choosing providers with OpenRouter

OpenRouter allows you to ignore specific providers in your preferences. This can be effective to exclude highly quantized or otherwise undesirable providers.

1.6 KiB Raw Blame History

Quantization matters

Choosing providers with OpenRouter

1.6 KiB

Raw Blame History