copy

2025-06-01 18:25:00 +00:00 · 2024-12-04 06:38:38 -08:00 · 2024-12-04 06:38:38 -08:00 · 0d983d504b
commit 0d983d504b
parent f26ccfa3e9
2 changed files with 66 additions and 14 deletions
--- a/aider/website/_data/qwq.yml
+++ b/aider/website/_data/qwq.yml
@ -120,4 +120,51 @@
  date: 2024-12-04
  versions: 0.66.1.dev
  seconds_per_case: 414.3
-  total_cost: 0.0000
+  total_cost: 0.0000
 - dirname: 2024-09-12-19-57-35--o1-mini-whole
  test_cases: 133
  model: o1-mini
  edit_format: whole
  commit_hash: 36fa773-dirty, 291b456
  pass_rate_1: 49.6
  pass_rate_2: 70.7
  percent_cases_well_formed: 90.0
  error_outputs: 0
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 17
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 1
  command: aider --model o1-mini
  date: 2024-09-12
  versions: 0.56.1.dev
  seconds_per_case: 103.0
  total_cost: 5.3725
 - dirname: 2024-09-21-16-45-11--o1-preview-flex-sr-markers
  test_cases: 133
  model: o1-preview
  _released: 2024-09-12
  edit_format: diff
  commit_hash: 5493654-dirty
  pass_rate_1: 57.9
  pass_rate_2: 79.7
  percent_cases_well_formed: 93.2
  error_outputs: 11
  num_malformed_responses: 11
  num_with_malformed_responses: 9
  user_asks: 3
  lazy_comments: 0
  syntax_errors: 10
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 1
  command: aider --model o1-preview
  date: 2024-09-21
  versions: 0.56.1.dev
  seconds_per_case: 80.9
  total_cost: 63.9190
--- a/aider/website/_posts/2024-12-03-qwq.md
+++ b/aider/website/_posts/2024-12-03-qwq.md
@ -16,21 +16,26 @@ nav_exclude: true
 QwQ 32B Preview is a "reasoning" model, which spends a lot of tokens thinking before
 rendering a final response.
-In this way, it is similar to OpenAI's o1 models which are best used by
+This is similar to OpenAI's o1 models, which are most effective with aider
-[pairing the reasoning model as an architect with a traditional LLM as an editor](https://aider.chat/2024/09/26/architect.html).
+[when paired as an architect with a traditional LLM as an editor](https://aider.chat/2024/09/26/architect.html).
 In this mode, the reasoning model acts as an "architect" to propose a solution to the
 coding problem without regard for how to actually make edits to the source files.
 The "editor" model receives that proposal, and focuses solely on how to
 edit the existing source code to implement it.
-Used alone, QwQ was unable to comply with even the simplest editing format.
+Used alone without being paired with an editor, 
-So it was not very successful at editing source code files.
+QwQ was unable to comply with even the simplest editing format.
-QwQ's solo score on the benchmark was underwhelming,
+It was not able to reliably edit source code files.
-far worse than the o1 models performing solo.
+As a result, QwQ's solo score on the benchmark was quite underwhelming
 (and far worse than the o1 models performing solo).
-QwQ can perform better than the
+QwQ is based on
-Qwen 2.5 Coder 32B Instruct model that it is based on
+Qwen 2.5 Coder 32B Instruct,
-when they are paired as architect + editor.
+and does better when paired with it as an architect + editor combo.
-This provides only a modest benefit,
+Though this provided only a modest benchmark improvement over just using Qwen alone,
-but results in a fairly slow overall response time.
+and comes with a fairly high cost in terms of latency.
 Each request must wait for QwQ to return all its thinking text
-and the ultimate solution.
+and the final solution proposal.
 And then one must wait for Qwen to turn that large
 response into actual file edits.
@ -38,7 +43,7 @@ Pairing QwQ with other sensible editor models performed the same or worse than
 just using Qwen 2.5 Coder 32B Instruct alone.
 QwQ+Qwen seems to be the best way to use QwQ, achieving a score of 74%.
-That is well off the
+That is well below the
 SOTA results for this benchmark: Sonnet alone scores 84%, and
 o1-preview + o1-mini as architect + editor scores 85%.