diff --git a/aider/website/_data/quant.yml b/aider/website/_data/quant.yml index 536ae25af..59b70c2e1 100644 --- a/aider/website/_data/quant.yml +++ b/aider/website/_data/quant.yml @@ -67,26 +67,3 @@ versions: 0.64.2.dev seconds_per_case: 86.7 total_cost: 0.0000 - -- dirname: 2024-11-22-03-33-30--ollama-qwen25-coder-krith-instruct - test_cases: 133 - model: ollama/krith/qwen2.5-coder-32b-instruct:IQ2_M - edit_format: diff - commit_hash: fbadfcf-dirty - pass_rate_1: 16.5 - pass_rate_2: 21.1 - percent_cases_well_formed: 60.9 - error_outputs: 1169 - num_malformed_responses: 148 - num_with_malformed_responses: 52 - user_asks: 58 - lazy_comments: 0 - syntax_errors: 3 - indentation_errors: 1 - exhausted_context_windows: 0 - test_timeouts: 4 - command: aider --model ollama/krith/qwen2.5-coder-32b-instruct:IQ2_M - date: 2024-11-22 - versions: 0.64.2.dev - seconds_per_case: 169.7 - total_cost: 0.00 \ No newline at end of file diff --git a/aider/website/_posts/2024-11-21-quantization.md b/aider/website/_posts/2024-11-21-quantization.md index 42e2831a4..ca675009f 100644 --- a/aider/website/_posts/2024-11-21-quantization.md +++ b/aider/website/_posts/2024-11-21-quantization.md @@ -11,7 +11,7 @@ nav_exclude: true # Quantization matters -Open source models like Qwen 2.5 32B are performing very well on +Open source models like Qwen 2.5 32B Instruct are performing very well on aider's code editing benchmark, rivaling closed source frontier models. But pay attention to how your model is being quantized, as it can strongly impact code editing skill. @@ -24,16 +24,15 @@ and local model servers like Ollama. {% include quant-chart.js %} -The graph above compares 3 different versions of the Qwen 2.5 Coder 32B model, +The graph above compares 3 different versions of the Qwen 2.5 Coder 32B Instruct model, served both locally and from cloud providers. - The [HuggingFace weights](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) served via [glhf.chat](https://glhf.chat). - The results from [OpenRouter's mix of providers](https://openrouter.ai/qwen/qwen-2.5-coder-32b-instruct/providers) which serve the model with different levels of quantization. - Ollama locally serving [qwen2.5-coder:32b-instruct-q4_K_M)](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M), which has `Q4_K_M` quantization. -- Ollama locally serving [krith/qwen2.5-coder-32b-instruct:IQ2_M](https://ollama.com/krith/qwen2.5-coder-32b-instruct), which has IQ2_M quantization. -The best version of the model rivals GPT-4o, while the worst performers -are more like GPT-3.5 Turbo level to completely useless. +The best version of the model rivals GPT-4o, while the worst performer +is more like GPT-3.5 Turbo level. ## Choosing providers with OpenRouter @@ -44,4 +43,4 @@ undesirable providers. {: .note } The original version of this article included incorrect Ollama models -that were not Qwen 2.5 Coder 32B. +that were not Qwen 2.5 Coder 32B Instruct.