copy

2025-05-29 16:54:59 +00:00 · 2024-11-24 07:14:09 -08:00 · 2024-11-24 07:14:09 -08:00 · aee94a0584
commit aee94a0584
parent c550422168
2 changed files with 36 additions and 6 deletions
--- a/aider/website/_data/quant.yml
+++ b/aider/website/_data/quant.yml
@ -274,3 +274,26 @@
  versions: 0.64.2.dev
  seconds_per_case: 110.0
  total_cost: 0.1763
 - dirname: 2024-11-24-15-00-50--qwen25-32b-or-deepinfra
  test_cases: 133
  model: "Deepinfra via OpenRouter: BF16"
  edit_format: diff
  commit_hash: c2f184f
  pass_rate_1: 57.1
  pass_rate_2: 69.9
  percent_cases_well_formed: 89.5
  error_outputs: 35
  num_malformed_responses: 31
  num_with_malformed_responses: 14
  user_asks: 11
  lazy_comments: 0
  syntax_errors: 1
  indentation_errors: 1
  exhausted_context_windows: 4
  test_timeouts: 1
  command: aider --model openrouter/qwen/qwen-2.5-coder-32b-instruct
  date: 2024-11-24
  versions: 0.64.2.dev
  seconds_per_case: 28.5
  total_cost: 0.1390
--- a/aider/website/_posts/2024-11-21-quantization.md
+++ b/aider/website/_posts/2024-11-21-quantization.md
@ -18,11 +18,6 @@ can impact code editing skill.
 Heavily quantized models are often used by cloud API providers
 and local model servers like Ollama or MLX.
 <canvas id="quantChart" width="800" height="500" style="margin: 20px 0"></canvas>
 <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
 <script>
 {% include quant-chart.js %}
 </script>
 The graph above compares different versions of the Qwen 2.5 Coder 32B Instruct model,
 served both locally and from cloud providers.
@ -34,11 +29,23 @@ served both locally and from cloud providers.
 - Other API providers.
 The best version of the model rivals GPT-4o, while the worst performer
-is more like GPT-4 Turbo level.
+is worse than GPT-3.5 Turbo.
 Hyperbolic via OpenRouter in particular is confusing.
 Their direct API produces excellent results, but the performance
 through OpenRouter is very poor.
 It's unclear why this is happening to just this provider.
 The other providers available through OpenRouter perform similarly
 when their API is accessed directly.
 {: .note }
 This article is being updated as additional benchmark runs complete.
 <canvas id="quantChart" width="800" height="600" style="margin: 20px 0"></canvas>
 <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
 <script>
 {% include quant-chart.js %}
 </script>
 <input type="text" id="quantSearchInput" placeholder="Search..." style="width: 100%; max-width: 800px; margin: 10px auto; padding: 8px; display: block; border: 1px solid #ddd; border-radius: 4px;">