chore(model-gallery): add more quants for popular models (#3365)

* models(gallery): add higher quants for some llama and hermes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* models(gallery): vllm: specify a reasonable max_tokens

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto 2024-08-24 00:29:24 +02:00 committed by GitHub
parent ac5f6f210b
commit 84d6e5a987
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 53 additions and 0 deletions

View file

@ -3,6 +3,8 @@ name: "hermes-vllm"
config_file: |
backend: vllm
parameters:
max_tokens: 8192
context_size: 8192
stopwords:
- "<|im_end|>"