This commit is contained in:
Paul Gauthier 2024-11-23 18:13:58 -08:00
parent 8d0ba40d67
commit 92579243c5
2 changed files with 34 additions and 27 deletions

View file

@ -24,7 +24,7 @@
- dirname: 2024-11-22-18-56-13--ollama-qwen2.5-coder:32b-instruct-fp16 - dirname: 2024-11-22-18-56-13--ollama-qwen2.5-coder:32b-instruct-fp16
test_cases: 132 test_cases: 132
model: ollama/qwen2.5-coder:32b-instruct-fp16 (64k context) model: Ollama fp16
edit_format: diff edit_format: diff
commit_hash: f06452c-dirty, 6a0a97c-dirty, 4e9ae16-dirty, 5506d0f-dirty commit_hash: f06452c-dirty, 6a0a97c-dirty, 4e9ae16-dirty, 5506d0f-dirty
pass_rate_1: 58.3 pass_rate_1: 58.3
@ -70,7 +70,7 @@
- dirname: 2024-11-22-17-53-35--qwen25-coder-32b-Instruct-4bit - dirname: 2024-11-22-17-53-35--qwen25-coder-32b-Instruct-4bit
test_cases: 133 test_cases: 133
model: mlx-community/Qwen2.5-Coder-32B-Instruct-4bit model: mlx-community 4bit
edit_format: diff edit_format: diff
commit_hash: a16dcab-dirty commit_hash: a16dcab-dirty
pass_rate_1: 60.2 pass_rate_1: 60.2
@ -93,7 +93,7 @@
- dirname: 2024-11-23-15-07-20--qwen25-coder-32b-Instruct-8bit - dirname: 2024-11-23-15-07-20--qwen25-coder-32b-Instruct-8bit
test_cases: 133 test_cases: 133
model: mlx-community/Qwen2.5-Coder-32B-Instruct-8bit model: mlx-community 8bit
edit_format: diff edit_format: diff
commit_hash: a16dcab-dirty commit_hash: a16dcab-dirty
pass_rate_1: 59.4 pass_rate_1: 59.4
@ -137,26 +137,25 @@
seconds_per_case: 40.7 seconds_per_case: 40.7
total_cost: 0.1497 total_cost: 0.1497
- dirname: 2024-11-21-23-33-47--ollama-qwen25-coder - dirname: 2024-11-23-21-08-53--ollama-qwen2.5-coder:32b-instruct-q4_K_M-8kctx
test_cases: 133 test_cases: 133
model: Ollama Q4_K_M model: Ollama q4_K_M
edit_format: diff edit_format: diff
commit_hash: 488c88d-dirty commit_hash: baa1335-dirty, e63df83-dirty, ff8c1aa-dirty
pass_rate_1: 44.4 pass_rate_1: 54.9
pass_rate_2: 53.4 pass_rate_2: 66.9
percent_cases_well_formed: 44.4 percent_cases_well_formed: 94.0
error_outputs: 231 error_outputs: 21
num_malformed_responses: 183 num_malformed_responses: 21
num_with_malformed_responses: 74 num_with_malformed_responses: 8
user_asks: 79 user_asks: 5
lazy_comments: 0 lazy_comments: 0
syntax_errors: 2 syntax_errors: 0
indentation_errors: 0 indentation_errors: 0
exhausted_context_windows: 0 exhausted_context_windows: 0
test_timeouts: 2 test_timeouts: 3
command: aider --model ollama/qwen2.5-coder:32b-instruct-q4_K_M command: aider --model ollama/qwen2.5-coder:32b-instruct-q4_K_M
date: 2024-11-21 date: 2024-11-23
versions: 0.64.2.dev versions: 0.64.2.dev
seconds_per_case: 86.7 seconds_per_case: 35.7
total_cost: 0.0000 total_cost: 0.0000

View file

@ -18,7 +18,7 @@ can strongly impact code editing skill.
Heavily quantized models are often used by cloud API providers Heavily quantized models are often used by cloud API providers
and local model servers like Ollama or MLX. and local model servers like Ollama or MLX.
<canvas id="quantChart" width="800" height="600" style="margin: 20px 0"></canvas> <canvas id="quantChart" width="800" height="500" style="margin: 20px 0"></canvas>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script> <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<script> <script>
{% include quant-chart.js %} {% include quant-chart.js %}
@ -29,16 +29,16 @@ served both locally and from cloud providers.
- The [HuggingFace BF16 weights](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) served via [glhf.chat](https://glhf.chat). - The [HuggingFace BF16 weights](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) served via [glhf.chat](https://glhf.chat).
- Hyperbolic labs API for [qwen2-5-coder-32b-instruct](https://app.hyperbolic.xyz/models/qwen2-5-coder-32b-instruct), which is using BF16. This result is probably within the expected variance of the HF result. - Hyperbolic labs API for [qwen2-5-coder-32b-instruct](https://app.hyperbolic.xyz/models/qwen2-5-coder-32b-instruct), which is using BF16. This result is probably within the expected variance of the HF result.
- A [4bit quant for mlx](https://t.co/cwX3DYX35D). - [4bit and 8bit quants for mlx](https://t.co/cwX3DYX35D).
- The results from [OpenRouter's mix of providers](https://openrouter.ai/qwen/qwen-2.5-coder-32b-instruct/providers) which serve the model with different levels of quantization. - The results from [OpenRouter's mix of providers](https://openrouter.ai/qwen/qwen-2.5-coder-32b-instruct/providers) which serve the model with different levels of quantization.
- Ollama locally serving [qwen2.5-coder:32b-instruct-q4_K_M)](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M), which has `Q4_K_M` quantization, with Ollama's default 2k context window. - Ollama locally serving different quantizations from the [Ollama model library](https://ollama.com/library/qwen2.5-coder:32b-instruct-q4_K_M).
The best version of the model rivals GPT-4o, while the worst performer The best version of the model rivals GPT-4o, while the worst performer
is more like GPT-3.5 Turbo level. is more like GPT-4 level.
{: .note } {: .note }
This article is being updated as additional benchmark runs complete. This article is being updated as additional benchmark runs complete.
The original version included incorrect Ollama models.
<input type="text" id="quantSearchInput" placeholder="Search..." style="width: 100%; max-width: 800px; margin: 10px auto; padding: 8px; display: block; border: 1px solid #ddd; border-radius: 4px;"> <input type="text" id="quantSearchInput" placeholder="Search..." style="width: 100%; max-width: 800px; margin: 10px auto; padding: 8px; display: block; border: 1px solid #ddd; border-radius: 4px;">
@ -100,11 +100,14 @@ document.getElementById('quantSearchInput').addEventListener('keyup', function()
}); });
</script> </script>
## Setting the context window size ## Setting Ollama's context window size
[Ollama uses a 2k context window by default](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size), [Ollama uses a 2k context window by default](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size),
which is very small for working with aider. which is very small for working with aider.
All of the Ollama results above were collected with at least an 8k context window, which
is large enough to attempt all the coding problems in the benchmark.
You can set the Ollama server's context window with a You can set the Ollama server's context window with a
[`.aider.model.settings.yml` file](https://aider.chat/docs/config/adv-model-settings.html#model-settings) [`.aider.model.settings.yml` file](https://aider.chat/docs/config/adv-model-settings.html#model-settings)
like this: like this:
@ -112,7 +115,7 @@ like this:
``` ```
- name: aider/extra_params - name: aider/extra_params
extra_params: extra_params:
num_ctx: 65536 num_ctx: 8192
``` ```
That uses the special model name `aider/extra_params` to set it for *all* models. You should probably use a specific model name like: That uses the special model name `aider/extra_params` to set it for *all* models. You should probably use a specific model name like:
@ -120,7 +123,7 @@ That uses the special model name `aider/extra_params` to set it for *all* models
``` ```
- name: ollama/qwen2.5-coder:32b-instruct-fp16 - name: ollama/qwen2.5-coder:32b-instruct-fp16
extra_params: extra_params:
num_ctx: 65536 num_ctx: 8192
``` ```
## Choosing providers with OpenRouter ## Choosing providers with OpenRouter
@ -130,3 +133,8 @@ OpenRouter allows you to ignore specific providers in your
This can be effective to exclude highly quantized or otherwise This can be effective to exclude highly quantized or otherwise
undesirable providers. undesirable providers.
{: .note }
Earlier versions of this article included incorrect Ollama models,
and also included some Ollama results with the too small default 2k
context window.