This commit is contained in:
Paul Gauthier 2024-12-04 07:22:09 -08:00
parent a4dcde2fc2
commit a0101c14a6

View file

@ -55,8 +55,8 @@ I spent some time experimenting with a variety of custom editing formats
for QwQ.
In particular, I tried to parse the QwQ response and discard the long
sections of "thinking" and retain only the "final" solution.
While I was able to successfully tease these sections apart,
it did not translate to any significant improvement in the benchmarking results.
None of this custom work seemed to translate
into any significant improvement in the benchmark results.
## Results
@ -128,12 +128,13 @@ document.getElementById('qwqSearchInput').addEventListener('keyup', function() {
As discussed in a recent blog post,
[details matter with open source models](https://aider.chat/2024/11/21/quantization.html).
For clarity, I benchmarked against OpenRouter's endpoints for
For clarity, new benchmark runs for this article were
performed against OpenRouter's endpoints for
QwQ 32B Preview and Qwen 2.5 Coder 32B Instruct.
For the other models, I went direct to their provider's APIs.
For the other models, the benchmark was direct to their providers' APIs.
Having recently done extensive testing of OpenRouter's Qwen 2.5 Coder 32B Instruct,
I feel comfortable using it. I blocked the provider Mancer due to small
context window.
Having recently done extensive testing of OpenRouter's Qwen 2.5 Coder 32B Instruct endpoint,
it seems reliable.
The provider Mancer was blocked due to the small context window it provides.
For QwQ 32B Preview, I blocked Fireworks because of its small context window.
For QwQ 32B Preview, Fireworks was blocked because of its small context window.