fix ollama models included in quant blog

This commit is contained in:
Paul Gauthier 2024-11-22 06:01:01 -08:00
parent dbd7f51f5c
commit ebba8f5110
2 changed files with 5 additions and 29 deletions

View file

@ -67,26 +67,3 @@
versions: 0.64.2.dev versions: 0.64.2.dev
seconds_per_case: 86.7 seconds_per_case: 86.7
total_cost: 0.0000 total_cost: 0.0000
- dirname: 2024-11-22-03-33-30--ollama-qwen25-coder-krith-instruct
test_cases: 133
model: ollama/krith/qwen2.5-coder-32b-instruct:IQ2_M
edit_format: diff
commit_hash: fbadfcf-dirty
pass_rate_1: 16.5
pass_rate_2: 21.1
percent_cases_well_formed: 60.9
error_outputs: 1169
num_malformed_responses: 148
num_with_malformed_responses: 52
user_asks: 58
lazy_comments: 0
syntax_errors: 3
indentation_errors: 1
exhausted_context_windows: 0
test_timeouts: 4
command: aider --model ollama/krith/qwen2.5-coder-32b-instruct:IQ2_M
date: 2024-11-22
versions: 0.64.2.dev
seconds_per_case: 169.7
total_cost: 0.00

View file

@ -11,7 +11,7 @@ nav_exclude: true
# Quantization matters # Quantization matters
Open source models like Qwen 2.5 32B are performing very well on Open source models like Qwen 2.5 32B Instruct are performing very well on
aider's code editing benchmark, rivaling closed source frontier models. aider's code editing benchmark, rivaling closed source frontier models.
But pay attention to how your model is being quantized, as it But pay attention to how your model is being quantized, as it
can strongly impact code editing skill. can strongly impact code editing skill.
@ -24,16 +24,15 @@ and local model servers like Ollama.
{% include quant-chart.js %} {% include quant-chart.js %}
</script> </script>
The graph above compares 3 different versions of the Qwen 2.5 Coder 32B model, The graph above compares 3 different versions of the Qwen 2.5 Coder 32B Instruct model,
served both locally and from cloud providers. served both locally and from cloud providers.
- The [HuggingFace weights](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) served via [glhf.chat](https://glhf.chat). - The [HuggingFace weights](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) served via [glhf.chat](https://glhf.chat).
- The results from [OpenRouter's mix of providers](https://openrouter.ai/qwen/qwen-2.5-coder-32b-instruct/providers) which serve the model with different levels of quantization. - The results from [OpenRouter's mix of providers](https://openrouter.ai/qwen/qwen-2.5-coder-32b-instruct/providers) which serve the model with different levels of quantization.
- Ollama locally serving [qwen2.5-coder:32b-instruct-q4_K_M)](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M), which has `Q4_K_M` quantization. - Ollama locally serving [qwen2.5-coder:32b-instruct-q4_K_M)](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M), which has `Q4_K_M` quantization.
- Ollama locally serving [krith/qwen2.5-coder-32b-instruct:IQ2_M](https://ollama.com/krith/qwen2.5-coder-32b-instruct), which has IQ2_M quantization.
The best version of the model rivals GPT-4o, while the worst performers The best version of the model rivals GPT-4o, while the worst performer
are more like GPT-3.5 Turbo level to completely useless. is more like GPT-3.5 Turbo level.
## Choosing providers with OpenRouter ## Choosing providers with OpenRouter
@ -44,4 +43,4 @@ undesirable providers.
{: .note } {: .note }
The original version of this article included incorrect Ollama models The original version of this article included incorrect Ollama models
that were not Qwen 2.5 Coder 32B. that were not Qwen 2.5 Coder 32B Instruct.