mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-20 10:35:01 +00:00
Bump vLLM version + more options when loading models in vLLM (#1782)
* Bump vLLM version to 0.3.2 * Add vLLM model loading options * Remove transformers-exllama * Fix install exllama
This commit is contained in:
parent
1c312685aa
commit
939411300a
28 changed files with 736 additions and 641 deletions
|
@ -245,8 +245,18 @@ backend: vllm
|
|||
parameters:
|
||||
model: "facebook/opt-125m"
|
||||
|
||||
# Decomment to specify a quantization method (optional)
|
||||
# Uncomment to specify a quantization method (optional)
|
||||
# quantization: "awq"
|
||||
# Uncomment to limit the GPU memory utilization (vLLM default is 0.9 for 90%)
|
||||
# gpu_memory_utilization: 0.5
|
||||
# Uncomment to trust remote code from huggingface
|
||||
# trust_remote_code: true
|
||||
# Uncomment to enable eager execution
|
||||
# enforce_eager: true
|
||||
# Uncomment to specify the size of the CPU swap space per GPU (in GiB)
|
||||
# swap_space: 2
|
||||
# Uncomment to specify the maximum length of a sequence (including prompt and output)
|
||||
# max_model_len: 32768
|
||||
```
|
||||
|
||||
The backend will automatically download the required files in order to run the model.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue