mirror of
https://github.com/Aider-AI/aider.git
synced 2025-06-11 23:25:01 +00:00
copy
This commit is contained in:
parent
f26ccfa3e9
commit
0d983d504b
2 changed files with 66 additions and 14 deletions
|
@ -16,21 +16,26 @@ nav_exclude: true
|
|||
|
||||
QwQ 32B Preview is a "reasoning" model, which spends a lot of tokens thinking before
|
||||
rendering a final response.
|
||||
In this way, it is similar to OpenAI's o1 models which are best used by
|
||||
[pairing the reasoning model as an architect with a traditional LLM as an editor](https://aider.chat/2024/09/26/architect.html).
|
||||
This is similar to OpenAI's o1 models, which are most effective with aider
|
||||
[when paired as an architect with a traditional LLM as an editor](https://aider.chat/2024/09/26/architect.html).
|
||||
In this mode, the reasoning model acts as an "architect" to propose a solution to the
|
||||
coding problem without regard for how to actually make edits to the source files.
|
||||
The "editor" model receives that proposal, and focuses solely on how to
|
||||
edit the existing source code to implement it.
|
||||
|
||||
Used alone, QwQ was unable to comply with even the simplest editing format.
|
||||
So it was not very successful at editing source code files.
|
||||
QwQ's solo score on the benchmark was underwhelming,
|
||||
far worse than the o1 models performing solo.
|
||||
Used alone without being paired with an editor,
|
||||
QwQ was unable to comply with even the simplest editing format.
|
||||
It was not able to reliably edit source code files.
|
||||
As a result, QwQ's solo score on the benchmark was quite underwhelming
|
||||
(and far worse than the o1 models performing solo).
|
||||
|
||||
QwQ can perform better than the
|
||||
Qwen 2.5 Coder 32B Instruct model that it is based on
|
||||
when they are paired as architect + editor.
|
||||
This provides only a modest benefit,
|
||||
but results in a fairly slow overall response time.
|
||||
QwQ is based on
|
||||
Qwen 2.5 Coder 32B Instruct,
|
||||
and does better when paired with it as an architect + editor combo.
|
||||
Though this provided only a modest benchmark improvement over just using Qwen alone,
|
||||
and comes with a fairly high cost in terms of latency.
|
||||
Each request must wait for QwQ to return all its thinking text
|
||||
and the ultimate solution.
|
||||
and the final solution proposal.
|
||||
And then one must wait for Qwen to turn that large
|
||||
response into actual file edits.
|
||||
|
||||
|
@ -38,7 +43,7 @@ Pairing QwQ with other sensible editor models performed the same or worse than
|
|||
just using Qwen 2.5 Coder 32B Instruct alone.
|
||||
|
||||
QwQ+Qwen seems to be the best way to use QwQ, achieving a score of 74%.
|
||||
That is well off the
|
||||
That is well below the
|
||||
SOTA results for this benchmark: Sonnet alone scores 84%, and
|
||||
o1-preview + o1-mini as architect + editor scores 85%.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue