chore(model-gallery): add more quants for popular models (#3365)

* models(gallery): add higher quants for some llama and hermes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * models(gallery): vllm: specify a reasonable max_tokens Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-06-01 16:34:59 +00:00 · 2024-08-24 00:29:24 +02:00 · 2024-08-24 00:29:24 +02:00 · 84d6e5a987
commit 84d6e5a987
parent ac5f6f210b
3 changed files with 53 additions and 0 deletions
--- a/gallery/hermes-vllm.yaml
+++ b/gallery/hermes-vllm.yaml
@ -3,6 +3,8 @@ name: "hermes-vllm"

 config_file: |
    backend: vllm
+    parameters:
+      max_tokens: 8192
    context_size: 8192
    stopwords:
    - "<|im_end|>"