mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-28 22:44:59 +00:00
docs: Initial import from localai-website (#1312)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
parent
763f94ca80
commit
c5c77d2b0d
66 changed files with 6111 additions and 0 deletions
39
docs/content/model-compatibility/vllm.md
Normal file
39
docs/content/model-compatibility/vllm.md
Normal file
|
@ -0,0 +1,39 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🆕 vLLM"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference.
|
||||
|
||||
LocalAI has a built-in integration with vLLM, and it can be used to run models. You can check out `vllm` performance [here](https://github.com/vllm-project/vllm#performance).
|
||||
|
||||
## Setup
|
||||
|
||||
Create a YAML file for the model you want to use with `vllm`.
|
||||
|
||||
To setup a model, you need to just specify the model name in the YAML config file:
|
||||
```yaml
|
||||
name: vllm
|
||||
backend: vllm
|
||||
parameters:
|
||||
model: "facebook/opt-125m"
|
||||
|
||||
# Decomment to specify a quantization method (optional)
|
||||
# quantization: "awq"
|
||||
```
|
||||
|
||||
The backend will automatically download the required files in order to run the model.
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
Use the `completions` endpoint by specifying the `vllm` backend:
|
||||
```
|
||||
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "vllm",
|
||||
"prompt": "Hello, my name is",
|
||||
"temperature": 0.1, "top_p": 0.1
|
||||
}'
|
||||
```
|
Loading…
Add table
Add a link
Reference in a new issue