copy

2025-06-11 23:25:01 +00:00 · 2024-12-04 06:38:38 -08:00 · 2024-12-04 06:38:38 -08:00 · 0d983d504b
commit 0d983d504b
parent f26ccfa3e9
2 changed files with 66 additions and 14 deletions
--- a/aider/website/_posts/2024-12-03-qwq.md
+++ b/aider/website/_posts/2024-12-03-qwq.md
@ -16,21 +16,26 @@ nav_exclude: true

 QwQ 32B Preview is a "reasoning" model, which spends a lot of tokens thinking before
 rendering a final response.
-In this way, it is similar to OpenAI's o1 models which are best used by
-[pairing the reasoning model as an architect with a traditional LLM as an editor](https://aider.chat/2024/09/26/architect.html).
+This is similar to OpenAI's o1 models, which are most effective with aider
+[when paired as an architect with a traditional LLM as an editor](https://aider.chat/2024/09/26/architect.html).
+In this mode, the reasoning model acts as an "architect" to propose a solution to the
+coding problem without regard for how to actually make edits to the source files.
+The "editor" model receives that proposal, and focuses solely on how to
+edit the existing source code to implement it.

-Used alone, QwQ was unable to comply with even the simplest editing format.
-So it was not very successful at editing source code files.
-QwQ's solo score on the benchmark was underwhelming,
-far worse than the o1 models performing solo.
+Used alone without being paired with an editor, 
+QwQ was unable to comply with even the simplest editing format.
+It was not able to reliably edit source code files.
+As a result, QwQ's solo score on the benchmark was quite underwhelming
+(and far worse than the o1 models performing solo).

-QwQ can perform better than the
-Qwen 2.5 Coder 32B Instruct model that it is based on
-when they are paired as architect + editor.
-This provides only a modest benefit,
-but results in a fairly slow overall response time.
+QwQ is based on
+Qwen 2.5 Coder 32B Instruct,
+and does better when paired with it as an architect + editor combo.
+Though this provided only a modest benchmark improvement over just using Qwen alone,
+and comes with a fairly high cost in terms of latency.
 Each request must wait for QwQ to return all its thinking text
-and the ultimate solution.
+and the final solution proposal.
 And then one must wait for Qwen to turn that large
 response into actual file edits.

@ -38,7 +43,7 @@ Pairing QwQ with other sensible editor models performed the same or worse than
 just using Qwen 2.5 Coder 32B Instruct alone.

 QwQ+Qwen seems to be the best way to use QwQ, achieving a score of 74%.
-That is well off the
+That is well below the
 SOTA results for this benchmark: Sonnet alone scores 84%, and
 o1-preview + o1-mini as architect + editor scores 85%.