feat(speculative-sampling): allow to specify a draft model in the model config (#1052)

**Description** This PR fixes #1013. It adds `draft_model` and `n_draft` to the model YAML config in order to load models with speculative sampling. This should be compatible as well with grammars. example: ```yaml backend: llama context_size: 1024 name: my-model-name parameters: model: foo-bar n_draft: 16 draft_model: model-name ``` --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-30 23:44:59 +00:00 · 2023-09-14 17:44:16 +02:00 · 2023-09-14 17:44:16 +02:00 · 8ccf5b2044
commit 8ccf5b2044
parent 247d85b523
12 changed files with 485 additions and 427 deletions
--- a/extra/grpc/autogptq/backend_pb2.py
+++ b/extra/grpc/autogptq/backend_pb2.py
--- a/extra/grpc/bark/backend_pb2.py
+++ b/extra/grpc/bark/backend_pb2.py
--- a/extra/grpc/diffusers/backend_pb2.py
+++ b/extra/grpc/diffusers/backend_pb2.py
--- a/extra/grpc/exllama/backend_pb2.py
+++ b/extra/grpc/exllama/backend_pb2.py
--- a/extra/grpc/huggingface/backend_pb2.py
+++ b/extra/grpc/huggingface/backend_pb2.py
--- a/extra/grpc/vall-e-x/backend_pb2.py
+++ b/extra/grpc/vall-e-x/backend_pb2.py
--- a/extra/grpc/vllm/backend_pb2.py
+++ b/extra/grpc/vllm/backend_pb2.py