mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-27 22:15:00 +00:00
docs: Initial import from localai-website (#1312)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
parent
763f94ca80
commit
c5c77d2b0d
66 changed files with 6111 additions and 0 deletions
80
docs/content/model-compatibility/_index.en.md
Normal file
80
docs/content/model-compatibility/_index.en.md
Normal file
|
@ -0,0 +1,80 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "Model compatibility"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
LocalAI is compatible with the models supported by [llama.cpp](https://github.com/ggerganov/llama.cpp) supports also [GPT4ALL-J](https://github.com/nomic-ai/gpt4all) and [cerebras-GPT with ggml](https://huggingface.co/lxe/Cerebras-GPT-2.7B-Alpaca-SP-ggml).
|
||||
|
||||
{{% notice note %}}
|
||||
|
||||
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.
|
||||
|
||||
{{% /notice %}}
|
||||
|
||||
### Hardware requirements
|
||||
|
||||
Depending on the model you are attempting to run might need more RAM or CPU resources. Check out also [here](https://github.com/ggerganov/llama.cpp#memorydisk-requirements) for `ggml` based backends. `rwkv` is less expensive on resources.
|
||||
|
||||
### Model compatibility table
|
||||
|
||||
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the compatible models families and the associated binding repository.
|
||||
|
||||
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|
||||
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
|
||||
| [llama.cpp]({{%relref "model-compatibility/llama-cpp" %}}) | Vicuna, Alpaca, LLaMa | yes | GPT and Functions | yes** | yes | CUDA, openCL, cuBLAS, Metal |
|
||||
| [gpt4all-llama](https://github.com/nomic-ai/gpt4all) | Vicuna, Alpaca, LLaMa | yes | GPT | no | yes | N/A |
|
||||
| [gpt4all-mpt](https://github.com/nomic-ai/gpt4all) | MPT | yes | GPT | no | yes | N/A |
|
||||
| [gpt4all-j](https://github.com/nomic-ai/gpt4all) | GPT4ALL-J | yes | GPT | no | yes | N/A |
|
||||
| [falcon-ggml](https://github.com/ggerganov/ggml) ([binding](https://github.com/go-skynet/go-ggml-transformers.cpp)) | Falcon (*) | yes | GPT | no | no | N/A |
|
||||
| [gpt2](https://github.com/ggerganov/ggml) ([binding](https://github.com/go-skynet/go-ggml-transformers.cpp)) | GPT2, Cerebras | yes | GPT | no | no | N/A |
|
||||
| [dolly](https://github.com/ggerganov/ggml) ([binding](https://github.com/go-skynet/go-ggml-transformers.cpp)) | Dolly | yes | GPT | no | no | N/A |
|
||||
| [gptj](https://github.com/ggerganov/ggml) ([binding](https://github.com/go-skynet/go-ggml-transformers.cpp)) | GPTJ | yes | GPT | no | no | N/A |
|
||||
| [mpt](https://github.com/ggerganov/ggml) ([binding](https://github.com/go-skynet/go-ggml-transformers.cpp)) | MPT | yes | GPT | no | no | N/A |
|
||||
| [replit](https://github.com/ggerganov/ggml) ([binding](https://github.com/go-skynet/go-ggml-transformers.cpp)) | Replit | yes | GPT | no | no | N/A |
|
||||
| [gptneox](https://github.com/ggerganov/ggml) ([binding](https://github.com/go-skynet/go-ggml-transformers.cpp)) | GPT NeoX, RedPajama, StableLM | yes | GPT | no | no | N/A |
|
||||
| [starcoder](https://github.com/ggerganov/ggml) ([binding](https://github.com/go-skynet/go-ggml-transformers.cpp)) | Starcoder | yes | GPT | no | no | N/A|
|
||||
| [bloomz](https://github.com/NouamaneTazi/bloomz.cpp) ([binding](https://github.com/go-skynet/bloomz.cpp)) | Bloom | yes | GPT | no | no | N/A |
|
||||
| [rwkv](https://github.com/saharNooby/rwkv.cpp) ([binding](https://github.com/donomii/go-rwkv.cpp)) | rwkv | yes | GPT | no | yes | N/A |
|
||||
| [bert](https://github.com/skeskinen/bert.cpp) ([binding](https://github.com/go-skynet/go-bert.cpp)) | bert | no | Embeddings only | yes | no | N/A |
|
||||
| [whisper](https://github.com/ggerganov/whisper.cpp) | whisper | no | Audio | no | no | N/A |
|
||||
| [stablediffusion](https://github.com/EdVince/Stable-Diffusion-NCNN) ([binding](https://github.com/mudler/go-stable-diffusion)) | stablediffusion | no | Image | no | no | N/A |
|
||||
| [langchain-huggingface](https://github.com/tmc/langchaingo) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
|
||||
| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | N/A |
|
||||
| [falcon](https://github.com/cmp-nct/ggllm.cpp/tree/c12b2d65f732a0d8846db2244e070f0f3e73505c) ([binding](https://github.com/mudler/go-ggllm.cpp)) | Falcon *** | yes | GPT | no | yes | CUDA |
|
||||
| `huggingface-embeddings` [sentence-transformers](https://github.com/UKPLab/sentence-transformers) | BERT | no | Embeddings only | yes | no | N/A |
|
||||
| `bark` | bark | no | Audio generation | no | no | yes |
|
||||
| `AutoGPTQ` | GPTQ | yes | GPT | yes | no | N/A |
|
||||
| `exllama` | GPTQ | yes | GPT only | no | no | N/A |
|
||||
| `diffusers` | SD,... | no | Image generation | no | no | N/A |
|
||||
| `vall-e-x` | Vall-E | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
|
||||
| `vllm` | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
|
||||
|
||||
Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).
|
||||
|
||||
- \* 7b ONLY
|
||||
- ** doesn't seem to be accurate
|
||||
- *** 7b and 40b with the `ggccv` format, for instance: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML
|
||||
|
||||
Tested with:
|
||||
|
||||
- [X] Automatically by CI with OpenLLAMA and GPT4ALL.
|
||||
- [X] LLaMA 🦙
|
||||
- [X] [Vicuna](https://github.com/ggerganov/llama.cpp/discussions/643#discussioncomment-5533894)
|
||||
- [Alpaca](https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca)
|
||||
- [X] [GPT4ALL](https://gpt4all.io) (see also [using GPT4All](https://github.com/ggerganov/llama.cpp#using-gpt4all))
|
||||
- [X] [GPT4ALL-J](https://gpt4all.io/models/ggml-gpt4all-j.bin) (no changes required)
|
||||
- [X] [Koala](https://bair.berkeley.edu/blog/2023/04/03/koala/) 🐨
|
||||
- [X] Cerebras-GPT
|
||||
- [X] [WizardLM](https://github.com/nlpxucan/WizardLM)
|
||||
- [X] [RWKV](https://github.com/BlinkDL/RWKV-LM) models with [rwkv.cpp](https://github.com/saharNooby/rwkv.cpp)
|
||||
- [X] [bloom.cpp](https://github.com/NouamaneTazi/bloomz.cpp)
|
||||
- [X] [Chinese LLaMA / Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
|
||||
- [X] [Vigogne (French)](https://github.com/bofenghuang/vigogne)
|
||||
- [X] [OpenBuddy 🐶 (Multilingual)](https://github.com/OpenBuddy/OpenBuddy)
|
||||
- [X] [Pygmalion 7B / Metharme 7B](https://github.com/ggerganov/llama.cpp#using-pygmalion-7b--metharme-7b)
|
||||
- [X] [HuggingFace Inference](https://huggingface.co/inference-api) models available through API
|
||||
- [X] Falcon
|
||||
|
||||
Note: You might need to convert some models from older models to the new format, for indications, see [the README in llama.cpp](https://github.com/ggerganov/llama.cpp#using-gpt4all) for instance to run `gpt4all`.
|
38
docs/content/model-compatibility/autogptq.md
Normal file
38
docs/content/model-compatibility/autogptq.md
Normal file
|
@ -0,0 +1,38 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🦙 AutoGPTQ"
|
||||
weight = 3
|
||||
+++
|
||||
|
||||
[AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) is an easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
This is an extra backend - in the container images is already available and there is nothing to do for the setup.
|
||||
|
||||
If you are building LocalAI locally, you need to install [AutoGPTQ manually](https://github.com/PanQiWei/AutoGPTQ#quick-installation).
|
||||
|
||||
|
||||
## Model setup
|
||||
|
||||
The models are automatically downloaded from `huggingface` if not present the first time. It is possible to define models via `YAML` config file, or just by querying the endpoint with the `huggingface` repository model name. For example, create a `YAML` config file in `models/`:
|
||||
|
||||
```
|
||||
name: orca
|
||||
backend: autogptq
|
||||
model_base_name: "orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order"
|
||||
parameters:
|
||||
model: "TheBloke/orca_mini_v2_13b-GPTQ"
|
||||
# ...
|
||||
```
|
||||
|
||||
Test with:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "orca",
|
||||
"messages": [{"role": "user", "content": "How are you?"}],
|
||||
"temperature": 0.1
|
||||
}'
|
||||
```
|
39
docs/content/model-compatibility/bark.md
Normal file
39
docs/content/model-compatibility/bark.md
Normal file
|
@ -0,0 +1,39 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🐶 Bark"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
|
||||
[Bark](https://github.com/suno-ai/bark) allows to generate audio from text prompts.
|
||||
|
||||
## Setup
|
||||
|
||||
This is an extra backend - in the container is already available and there is nothing to do for the setup.
|
||||
|
||||
|
||||
## Model setup
|
||||
|
||||
There is nothing to be done for the model setup. You can already start to use bark. The models will be downloaded the first time you use the backend.
|
||||
|
||||
## Usage
|
||||
|
||||
Use the `tts` endpoint by specifying the `bark` backend:
|
||||
|
||||
```
|
||||
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
|
||||
"backend": "bark",
|
||||
"input":"Hello!"
|
||||
}' | aplay
|
||||
```
|
||||
|
||||
To specify a voice from https://github.com/suno-ai/bark#-voice-presets ( https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c ), use the `model` parameter:
|
||||
|
||||
```
|
||||
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
|
||||
"backend": "bark",
|
||||
"input":"Hello!",
|
||||
"model": "v2/en_speaker_4"
|
||||
}' | aplay
|
||||
```
|
170
docs/content/model-compatibility/diffusers.md
Normal file
170
docs/content/model-compatibility/diffusers.md
Normal file
|
@ -0,0 +1,170 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🧨 Diffusers"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
[Diffusers](https://huggingface.co/docs/diffusers/index) is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. LocalAI has a diffusers backend which allows image generation using the `diffusers` library.
|
||||
|
||||

|
||||
(Generated with [AnimagineXL](https://huggingface.co/Linaqruf/animagine-xl))
|
||||
|
||||
Note: currently only the image generation is supported. It is experimental, so you might encounter some issues on models which weren't tested yet.
|
||||
|
||||
## Setup
|
||||
|
||||
This is an extra backend - in the container is already available and there is nothing to do for the setup.
|
||||
|
||||
## Model setup
|
||||
|
||||
The models will be downloaded the first time you use the backend from `huggingface` automatically.
|
||||
|
||||
Create a model configuration file in the `models` directory, for instance to use `Linaqruf/animagine-xl` with CPU:
|
||||
|
||||
```yaml
|
||||
name: animagine-xl
|
||||
parameters:
|
||||
model: Linaqruf/animagine-xl
|
||||
backend: diffusers
|
||||
|
||||
# Force CPU usage - set to true for GPU
|
||||
f16: false
|
||||
diffusers:
|
||||
pipeline_type: StableDiffusionXLPipeline
|
||||
cuda: false # Enable for GPU usage (CUDA)
|
||||
scheduler_type: euler_a
|
||||
```
|
||||
|
||||
## Local models
|
||||
|
||||
You can also use local models, or modify some parameters like `clip_skip`, `scheduler_type`, for instance:
|
||||
|
||||
```yaml
|
||||
name: stablediffusion
|
||||
parameters:
|
||||
model: toonyou_beta6.safetensors
|
||||
backend: diffusers
|
||||
step: 30
|
||||
f16: true
|
||||
diffusers:
|
||||
pipeline_type: StableDiffusionPipeline
|
||||
cuda: true
|
||||
enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
|
||||
scheduler_type: "k_dpmpp_sde"
|
||||
cfg_scale: 8
|
||||
clip_skip: 11
|
||||
```
|
||||
|
||||
## Configuration parameters
|
||||
|
||||
The following parameters are available in the configuration file:
|
||||
|
||||
| Parameter | Description | Default |
|
||||
| --- | --- | --- |
|
||||
| `f16` | Force the usage of `float16` instead of `float32` | `false` |
|
||||
| `step` | Number of steps to run the model for | `30` |
|
||||
| `cuda` | Enable CUDA acceleration | `false` |
|
||||
| `enable_parameters` | Parameters to enable for the model | `negative_prompt,num_inference_steps,clip_skip` |
|
||||
| `scheduler_type` | Scheduler type | `k_dpp_sde` |
|
||||
| `cfg_scale` | Configuration scale | `8` |
|
||||
| `clip_skip` | Clip skip | None |
|
||||
| `pipeline_type` | Pipeline type | `StableDiffusionPipeline` |
|
||||
|
||||
There are available several types of schedulers:
|
||||
|
||||
| Scheduler | Description |
|
||||
| --- | --- |
|
||||
| `ddim` | DDIM |
|
||||
| `pndm` | PNDM |
|
||||
| `heun` | Heun |
|
||||
| `unipc` | UniPC |
|
||||
| `euler` | Euler |
|
||||
| `euler_a` | Euler a |
|
||||
| `lms` | LMS |
|
||||
| `k_lms` | LMS Karras |
|
||||
| `dpm_2` | DPM2 |
|
||||
| `k_dpm_2` | DPM2 Karras |
|
||||
| `dpm_2_a` | DPM2 a |
|
||||
| `k_dpm_2_a` | DPM2 a Karras |
|
||||
| `dpmpp_2m` | DPM++ 2M |
|
||||
| `k_dpmpp_2m` | DPM++ 2M Karras |
|
||||
| `dpmpp_sde` | DPM++ SDE |
|
||||
| `k_dpmpp_sde` | DPM++ SDE Karras |
|
||||
| `dpmpp_2m_sde` | DPM++ 2M SDE |
|
||||
| `k_dpmpp_2m_sde` | DPM++ 2M SDE Karras |
|
||||
|
||||
Pipelines types available:
|
||||
|
||||
| Pipeline type | Description |
|
||||
| --- | --- |
|
||||
| `StableDiffusionPipeline` | Stable diffusion pipeline |
|
||||
| `StableDiffusionImg2ImgPipeline` | Stable diffusion image to image pipeline |
|
||||
| `StableDiffusionDepth2ImgPipeline` | Stable diffusion depth to image pipeline |
|
||||
| `DiffusionPipeline` | Diffusion pipeline |
|
||||
| `StableDiffusionXLPipeline` | Stable diffusion XL pipeline |
|
||||
|
||||
## Usage
|
||||
|
||||
### Text to Image
|
||||
Use the `image` generation endpoint with the `model` name from the configuration file:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/images/generations \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"prompt": "<positive prompt>|<negative prompt>",
|
||||
"model": "animagine-xl",
|
||||
"step": 51,
|
||||
"size": "1024x1024"
|
||||
}'
|
||||
```
|
||||
|
||||
## Image to Image
|
||||
|
||||
https://huggingface.co/docs/diffusers/using-diffusers/img2img
|
||||
|
||||
An example model (GPU):
|
||||
```yaml
|
||||
name: stablediffusion-edit
|
||||
parameters:
|
||||
model: nitrosocke/Ghibli-Diffusion
|
||||
backend: diffusers
|
||||
step: 25
|
||||
|
||||
f16: true
|
||||
diffusers:
|
||||
pipeline_type: StableDiffusionImg2ImgPipeline
|
||||
cuda: true
|
||||
enable_parameters: "negative_prompt,num_inference_steps,image"
|
||||
```
|
||||
|
||||
```bash
|
||||
IMAGE_PATH=/path/to/your/image
|
||||
(echo -n '{"image": "'; base64 $IMAGE_PATH; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-edit"}') |
|
||||
curl -H "Content-Type: application/json" -d @- http://localhost:8080/v1/images/generations
|
||||
```
|
||||
|
||||
## Depth to Image
|
||||
|
||||
https://huggingface.co/docs/diffusers/using-diffusers/depth2img
|
||||
|
||||
```yaml
|
||||
name: stablediffusion-depth
|
||||
parameters:
|
||||
model: stabilityai/stable-diffusion-2-depth
|
||||
backend: diffusers
|
||||
step: 50
|
||||
# Force CPU usage
|
||||
f16: true
|
||||
diffusers:
|
||||
pipeline_type: StableDiffusionDepth2ImgPipeline
|
||||
cuda: true
|
||||
enable_parameters: "negative_prompt,num_inference_steps,image"
|
||||
cfg_scale: 6
|
||||
```
|
||||
|
||||
```bash
|
||||
(echo -n '{"image": "'; base64 ~/path/to/image.jpeg; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-depth"}') |
|
||||
curl -H "Content-Type: application/json" -d @- http://localhost:8080/v1/images/generations
|
||||
```
|
42
docs/content/model-compatibility/exllama.md
Normal file
42
docs/content/model-compatibility/exllama.md
Normal file
|
@ -0,0 +1,42 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🦙 Exllama"
|
||||
weight = 2
|
||||
+++
|
||||
|
||||
|
||||
[Exllama](https://github.com/turboderp/exllama) is a "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights"
|
||||
|
||||
## Prerequisites
|
||||
|
||||
This is an extra backend - in the container images is already available and there is nothing to do for the setup.
|
||||
|
||||
If you are building LocalAI locally, you need to install [exllama manually](https://github.com/jllllll/exllama#this-is-a-python-module-version-of-exllama) first.
|
||||
|
||||
## Model setup
|
||||
|
||||
Download the model as a folder inside the `model ` directory and create a YAML file specifying the `exllama` backend. For instance with the `TheBloke/WizardLM-7B-uncensored-GPTQ` model:
|
||||
|
||||
```
|
||||
$ git lfs install
|
||||
$ cd models && git clone https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ
|
||||
$ ls models/
|
||||
.keep WizardLM-7B-uncensored-GPTQ/ exllama.yaml
|
||||
$ cat models/exllama.yaml
|
||||
name: exllama
|
||||
parameters:
|
||||
model: WizardLM-7B-uncensored-GPTQ
|
||||
backend: exllama
|
||||
# ...
|
||||
```
|
||||
|
||||
Test with:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "exllama",
|
||||
"messages": [{"role": "user", "content": "How are you?"}],
|
||||
"temperature": 0.1
|
||||
}'
|
||||
```
|
81
docs/content/model-compatibility/llama-cpp.md
Normal file
81
docs/content/model-compatibility/llama-cpp.md
Normal file
|
@ -0,0 +1,81 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🦙 llama.cpp"
|
||||
weight = 1
|
||||
+++
|
||||
|
||||
[llama.cpp](https://github.com/ggerganov/llama.cpp) is a popular port of Facebook's LLaMA model in C/C++.
|
||||
|
||||
{{% notice note %}}
|
||||
|
||||
The `ggml` file format has been deprecated. If you are using `ggml` models and you are configuring your model with a YAML file, specify, use the `llama-stable` backend instead. If you are relying in automatic detection of the model, you should be fine. For `gguf` models, use the `llama` backend.
|
||||
|
||||
{{% /notice %}}
|
||||
|
||||
## Features
|
||||
|
||||
The `llama.cpp` model supports the following features:
|
||||
- [📖 Text generation (GPT)]({{%relref "features/text-generation" %}})
|
||||
- [🧠 Embeddings]({{%relref "features/embeddings" %}})
|
||||
- [🔥 OpenAI functions]({{%relref "features/openai-functions" %}})
|
||||
- [✍️ Constrained grammars]({{%relref "features/constrained_grammars" %}})
|
||||
|
||||
## Setup
|
||||
|
||||
LocalAI supports `llama.cpp` models out of the box. You can use the `llama.cpp` model in the same way as any other model.
|
||||
|
||||
### Manual setup
|
||||
|
||||
It is sufficient to copy the `ggml` or `guf` model files in the `models` folder. You can refer to the model in the `model` parameter in the API calls.
|
||||
|
||||
[You can optionally create an associated YAML]({{%relref "advanced" %}}) model config file to tune the model's parameters or apply a template to the prompt.
|
||||
|
||||
Prompt templates are useful for models that are fine-tuned towards a specific prompt.
|
||||
|
||||
### Automatic setup
|
||||
|
||||
LocalAI supports model galleries which are indexes of models. For instance, the huggingface gallery contains a large curated index of models from the huggingface model hub for `ggml` or `gguf` models.
|
||||
|
||||
For instance, if you have the galleries enabled, you can just start chatting with models in huggingface by running:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "TheBloke/WizardLM-13B-V1.2-GGML/wizardlm-13b-v1.2.ggmlv3.q2_K.bin",
|
||||
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
||||
"temperature": 0.1
|
||||
}'
|
||||
```
|
||||
|
||||
LocalAI will automatically download and configure the model in the `model` directory.
|
||||
|
||||
Models can be also preloaded or downloaded on demand. To learn about model galleries, check out the [model gallery documentation]({{%relref "models" %}}).
|
||||
|
||||
### YAML configuration
|
||||
|
||||
To use the `llama.cpp` backend, specify `llama` as the backend in the YAML file:
|
||||
|
||||
```yaml
|
||||
name: llama
|
||||
backend: llama
|
||||
parameters:
|
||||
# Relative to the models path
|
||||
model: file.gguf.bin
|
||||
```
|
||||
|
||||
In the example above we specify `llama` as the backend to restrict loading `gguf` models only.
|
||||
|
||||
For instance, to use the `llama-stable` backend for `ggml` models:
|
||||
|
||||
```yaml
|
||||
name: llama
|
||||
backend: llama-stable
|
||||
parameters:
|
||||
# Relative to the models path
|
||||
model: file.ggml.bin
|
||||
```
|
||||
|
||||
### Reference
|
||||
|
||||
- [llama](https://github.com/ggerganov/llama.cpp)
|
||||
- [binding](https://github.com/go-skynet/go-llama.cpp)
|
15
docs/content/model-compatibility/rwkv.md
Normal file
15
docs/content/model-compatibility/rwkv.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "RWKV"
|
||||
weight = 1
|
||||
+++
|
||||
|
||||
A full example on how to run a rwkv model is in the [examples](https://github.com/go-skynet/LocalAI/tree/master/examples/rwkv).
|
||||
|
||||
Note: rwkv models needs to specify the backend `rwkv` in the YAML config files and have an associated tokenizer along that needs to be provided with it:
|
||||
|
||||
```
|
||||
36464540 -rw-r--r-- 1 mudler mudler 1.2G May 3 10:51 rwkv_small
|
||||
36464543 -rw-r--r-- 1 mudler mudler 2.4M May 3 10:51 rwkv_small.tokenizer.json
|
||||
```
|
50
docs/content/model-compatibility/vall-e-x.md
Normal file
50
docs/content/model-compatibility/vall-e-x.md
Normal file
|
@ -0,0 +1,50 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🆕 Vall-E-X"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
[VALL-E-X](https://github.com/Plachtaa/VALL-E-X) is an open source implementation of Microsoft's VALL-E X zero-shot TTS model.
|
||||
|
||||
## Setup
|
||||
|
||||
The backend will automatically download the required files in order to run the model.
|
||||
|
||||
This is an extra backend - in the container is already available and there is nothing to do for the setup. If you are building manually, you need to install Vall-E-X manually first.
|
||||
|
||||
## Usage
|
||||
|
||||
Use the tts endpoint by specifying the vall-e-x backend:
|
||||
|
||||
```
|
||||
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
|
||||
"backend": "vall-e-x",
|
||||
"input":"Hello!"
|
||||
}' | aplay
|
||||
```
|
||||
|
||||
## Voice cloning
|
||||
|
||||
In order to use voice cloning capabilities you must create a `YAML` configuration file to setup a model:
|
||||
|
||||
```yaml
|
||||
name: cloned-voice
|
||||
backend: vall-e-x
|
||||
parameters:
|
||||
model: "cloned-voice"
|
||||
vall-e:
|
||||
# The path to the audio file to be cloned
|
||||
# relative to the models directory
|
||||
audio_path: "path-to-wav-source.wav"
|
||||
```
|
||||
|
||||
Then you can specify the model name in the requests:
|
||||
|
||||
```
|
||||
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
|
||||
"backend": "vall-e-x",
|
||||
"model": "cloned-voice",
|
||||
"input":"Hello!"
|
||||
}' | aplay
|
||||
```
|
39
docs/content/model-compatibility/vllm.md
Normal file
39
docs/content/model-compatibility/vllm.md
Normal file
|
@ -0,0 +1,39 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🆕 vLLM"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference.
|
||||
|
||||
LocalAI has a built-in integration with vLLM, and it can be used to run models. You can check out `vllm` performance [here](https://github.com/vllm-project/vllm#performance).
|
||||
|
||||
## Setup
|
||||
|
||||
Create a YAML file for the model you want to use with `vllm`.
|
||||
|
||||
To setup a model, you need to just specify the model name in the YAML config file:
|
||||
```yaml
|
||||
name: vllm
|
||||
backend: vllm
|
||||
parameters:
|
||||
model: "facebook/opt-125m"
|
||||
|
||||
# Decomment to specify a quantization method (optional)
|
||||
# quantization: "awq"
|
||||
```
|
||||
|
||||
The backend will automatically download the required files in order to run the model.
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
Use the `completions` endpoint by specifying the `vllm` backend:
|
||||
```
|
||||
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "vllm",
|
||||
"prompt": "Hello, my name is",
|
||||
"temperature": 0.1, "top_p": 0.1
|
||||
}'
|
||||
```
|
Loading…
Add table
Add a link
Reference in a new issue