From dc8761763da4ff9cf3dfbad14c372279d5ace9e0 Mon Sep 17 00:00:00 2001 From: Paul Gauthier Date: Sun, 24 Nov 2024 07:56:12 -0800 Subject: [PATCH] copy --- .../website/_posts/2024-11-21-quantization.md | 12 ++++++++++-- aider/website/assets/quantization.jpg | Bin 121393 -> 151649 bytes 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/aider/website/_posts/2024-11-21-quantization.md b/aider/website/_posts/2024-11-21-quantization.md index 6b65658cf..617c3fb9c 100644 --- a/aider/website/_posts/2024-11-21-quantization.md +++ b/aider/website/_posts/2024-11-21-quantization.md @@ -10,6 +10,7 @@ nav_exclude: true {% endif %} # Quantization matters +{: .no_toc } Open source models like Qwen 2.5 32B Instruct are performing very well on aider's code editing benchmark, rivaling closed source frontier models. @@ -18,8 +19,7 @@ can impact code editing skill. Heavily quantized models are often used by cloud API providers and local model servers like Ollama or MLX. - -The graph above compares different versions of the Qwen 2.5 Coder 32B Instruct model, +The graph and table below compares different versions of the Qwen 2.5 Coder 32B Instruct model, served both locally and from cloud providers. - The [HuggingFace BF16 weights](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) served via [glhf.chat](https://glhf.chat). @@ -38,9 +38,17 @@ It's unclear why this is happening to just this provider. The other providers available through OpenRouter perform similarly when their API is accessed directly. +### Sections +{: .no_toc } + +- TOC +{:toc} + {: .note } This article is being updated as additional benchmark runs complete. +## Benchmark results +