docs/examples: enhancements (#1572)

* docs: re-order sections

* fix references

* Add mixtral-instruct, tinyllama-chat, dolphin-2.5-mixtral-8x7b

* Fix link

* Minor corrections

* fix: models is a StringSlice, not a String

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP: switch docs theme

* content

* Fix GH link

* enhancements

* enhancements

* Fixed how to link

Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com>

* fixups

* logo fix

* more fixups

* final touches

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com>
Co-authored-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com>
This commit is contained in:
Ettore Di Giacinto 2024-01-18 19:41:08 +01:00 committed by GitHub
parent b5c93f176a
commit 6ca4d38a01
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
79 changed files with 1826 additions and 3546 deletions

View file

@ -0,0 +1,109 @@
+++
disableToc = false
title = "⚡ GPU acceleration"
weight = 9
+++
{{% alert context="warning" %}}
Section under construction
{{% /alert %}}
This section contains instruction on how to use LocalAI with GPU acceleration.
{{% alert icon="⚡" context="warning" %}}
For accelleration for AMD or Metal HW there are no specific container images, see the [build]({{%relref "docs/getting-started/build#Acceleration" %}})
{{% /alert %}}
### CUDA(NVIDIA) acceleration
#### Requirements
Requirement: nvidia-container-toolkit (installation instructions [1](https://www.server-world.info/en/note?os=Ubuntu_22.04&p=nvidia&f=2) [2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
To check what CUDA version do you need, you can either run `nvidia-smi` or `nvcc --version`.
Alternatively, you can also check nvidia-smi with docker:
```
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
```
To use CUDA, use the images with the `cublas` tag, for example.
The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=tags):
- CUDA `11` tags: `master-cublas-cuda11`, `v1.40.0-cublas-cuda11`, ...
- CUDA `12` tags: `master-cublas-cuda12`, `v1.40.0-cublas-cuda12`, ...
- CUDA `11` + FFmpeg tags: `master-cublas-cuda11-ffmpeg`, `v1.40.0-cublas-cuda11-ffmpeg`, ...
- CUDA `12` + FFmpeg tags: `master-cublas-cuda12-ffmpeg`, `v1.40.0-cublas-cuda12-ffmpeg`, ...
In addition to the commands to run LocalAI normally, you need to specify `--gpus all` to docker, for example:
```bash
docker run --rm -ti --gpus all -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -v $PWD/models:/models quay.io/go-skynet/local-ai:v1.40.0-cublas-cuda12
```
If the GPU inferencing is working, you should be able to see something like:
```
5:22PM DBG Loading model in memory from file: /models/open-llama-7b-q4_0.bin
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4
llama.cpp: loading model from /models/open-llama-7b-q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1024
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.07 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 4321.77 MB (+ 1026.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 10 repeating layers to GPU
llama_model_load_internal: offloaded 10/35 layers to GPU
llama_model_load_internal: total VRAM used: 1598 MB
...................................................................................................
llama_init_from_file: kv self size = 512.00 MB
```
#### Model configuration
Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):
```yaml
name: my-model-name
# Default model parameters
parameters:
# Relative to the models path
model: llama.cpp-model.ggmlv3.q5_K_M.bin
context_size: 1024
threads: 1
f16: true # enable with GPU acceleration
gpu_layers: 22 # GPU Layers (only used when built with cublas)
```
For diffusers instead, it might look like this instead:
```yaml
name: stablediffusion
parameters:
model: toonyou_beta6.safetensors
backend: diffusers
step: 30
f16: true
diffusers:
pipeline_type: StableDiffusionPipeline
cuda: true
enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
scheduler_type: "k_dpmpp_sde"
```

View file

@ -0,0 +1,7 @@
+++
disableToc = false
title = "Features"
weight = 8
icon = "feature_search"
+++

View file

@ -0,0 +1,43 @@
+++
disableToc = false
title = "🔈 Audio to text"
weight = 16
+++
Audio to text models are models that can generate text from an audio file.
The transcription endpoint allows to convert audio files to text. The endpoint is based on [whisper.cpp](https://github.com/ggerganov/whisper.cpp), a C++ library for audio transcription. The endpoint input supports all the audio formats supported by `ffmpeg`.
## Usage
Once LocalAI is started and whisper models are installed, you can use the `/v1/audio/transcriptions` API endpoint.
For instance, with cURL:
```bash
curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@<FILE_PATH>" -F model="<MODEL_NAME>"
```
## Example
Download one of the models from [here](https://huggingface.co/ggerganov/whisper.cpp/tree/main) in the `models` folder, and create a YAML file for your model:
```yaml
name: whisper-1
backend: whisper
parameters:
model: whisper-en
```
The transcriptions endpoint then can be tested like so:
```bash
## Get an example audio file
wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
## Send the example audio file to the transcriptions endpoint
curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@$PWD/gb1.ogg" -F model="whisper-1"
## Result
{"text":"My fellow Americans, this day has brought terrible news and great sadness to our country.At nine o'clock this morning, Mission Control in Houston lost contact with our Space ShuttleColumbia.A short time later, debris was seen falling from the skies above Texas.The Columbia's lost.There are no survivors.One board was a crew of seven.Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain DavidBrown, Commander William McCool, Dr. Kultna Shavla, and Elon Ramon, a colonel in the IsraeliAir Force.These men and women assumed great risk in the service to all humanity.In an age when spaceflight has come to seem almost routine, it is easy to overlook thedangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere ofthe Earth.These astronauts knew the dangers, and they faced them willingly, knowing they had a highand noble purpose in life.Because of their courage and daring and idealism, we will miss them all the more.All Americans today are thinking as well of the families of these men and women who havebeen given this sudden shock and grief.You're not alone.Our entire nation agrees with you, and those you loved will always have the respect andgratitude of this country.The cause in which they died will continue.Mankind has led into the darkness beyond our world by the inspiration of discovery andthe longing to understand.Our journey into space will go on.In the skies today, we saw destruction and tragedy.As farther than we can see, there is comfort and hope.In the words of the prophet Isaiah, \"Lift your eyes and look to the heavens who createdall these, he who brings out the starry hosts one by one and calls them each by name.\"Because of his great power and mighty strength, not one of them is missing.The same creator who names the stars also knows the names of the seven souls we mourntoday.The crew of the shuttle Columbia did not return safely to Earth yet we can pray that all aresafely home.May God bless the grieving families and may God continue to bless America.[BLANK_AUDIO]"}
```

View file

@ -0,0 +1,30 @@
+++
disableToc = false
title = "✍️ Constrained grammars"
weight = 15
+++
The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form).
This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar.
{{% alert note %}}
This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility]({{%relref "docs/reference/compatibility-table" %}})). For details on how it works, see the upstream PRs: https://github.com/ggerganov/llama.cpp/pull/1773, https://github.com/ggerganov/llama.cpp/pull/1887
{{% /alert %}}
## Setup
Follow the setup instructions from the [LocalAI functions]({{%relref "docs/features/openai-functions" %}}) page.
## 💡 Usage example
For example, to constrain the output to either `yes`, `no`:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Do you like apples?"}],
"grammar": "root ::= (\"yes\" | \"no\")"
}'
```

View file

@ -0,0 +1,102 @@
+++
disableToc = false
title = "🧠 Embeddings"
weight = 13
+++
LocalAI supports generating embeddings for text or list of tokens.
For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings
## Model compatibility
The embedding endpoint is compatible with `llama.cpp` models, `bert.cpp` models and sentence-transformers models available in huggingface.
## Manual Setup
Create a `YAML` config file in the `models` directory. Specify the `backend` and the model file.
```yaml
name: text-embedding-ada-002 # The model name used in the API
parameters:
model: <model_file>
backend: "<backend>"
embeddings: true
# .. other parameters
```
## Bert embeddings
To use `bert.cpp` models you can use the `bert` embedding backend.
An example model config file:
```yaml
name: text-embedding-ada-002
parameters:
model: bert
backend: bert-embeddings
embeddings: true
# .. other parameters
```
The `bert` backend uses [bert.cpp](https://github.com/skeskinen/bert.cpp) and uses `ggml` models.
For instance you can download the `ggml` quantized version of `all-MiniLM-L6-v2` from https://huggingface.co/skeskinen/ggml:
```bash
wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
```
To test locally (LocalAI server running on `localhost`),
you can use `curl` (and `jq` at the end to prettify):
```bash
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "Your text string goes here",
"model": "text-embedding-ada-002"
}' | jq "."
```
## Huggingface embeddings
To use `sentence-transformers` and models in `huggingface` you can use the `sentencetransformers` embedding backend.
```yaml
name: text-embedding-ada-002
backend: sentencetransformers
embeddings: true
parameters:
model: all-MiniLM-L6-v2
```
The `sentencetransformers` backend uses Python [sentence-transformers](https://github.com/UKPLab/sentence-transformers). For a list of all pre-trained models available see here: https://github.com/UKPLab/sentence-transformers#pre-trained-models
{{% alert note %}}
- The `sentencetransformers` backend is an optional backend of LocalAI and uses Python. If you are running `LocalAI` from the containers you are good to go and should be already configured for use.
- If you are running `LocalAI` manually you must install the python dependencies (`make prepare-extra-conda-environments`). This requires `conda` to be installed.
- For local execution, you also have to specify the extra backend in the `EXTERNAL_GRPC_BACKENDS` environment variable.
- Example: `EXTERNAL_GRPC_BACKENDS="sentencetransformers:/path/to/LocalAI/backend/python/sentencetransformers/sentencetransformers.py"`
- The `sentencetransformers` backend does support only embeddings of text, and not of tokens. If you need to embed tokens you can use the `bert` backend or `llama.cpp`.
- No models are required to be downloaded before using the `sentencetransformers` backend. The models will be downloaded automatically the first time the API is used.
{{% /alert %}}
## Llama.cpp embeddings
Embeddings with `llama.cpp` are supported with the `llama` backend.
```yaml
name: my-awesome-model
backend: llama
embeddings: true
parameters:
model: ggml-file.bin
# ...
```
## 💡 Examples
- Example that uses LLamaIndex and LocalAI as embedding: [here](https://github.com/go-skynet/LocalAI/tree/master/examples/query_data/).

View file

@ -0,0 +1,30 @@
+++
disableToc = false
title = "🆕 GPT Vision"
weight = 14
+++
{{% alert note %}}
Available only on `master` builds
{{% /alert %}}
LocalAI supports understanding images by using [LLaVA](https://llava.hliu.cc/), and implements the [GPT Vision API](https://platform.openai.com/docs/guides/vision) from OpenAI.
![llava](https://github.com/mudler/LocalAI/assets/2420543/cb0a0897-3b58-4350-af66-e6f4387b58d3)
## Usage
OpenAI docs: https://platform.openai.com/docs/guides/vision
To let LocalAI understand and reply with what sees in the image, use the `/v1/chat/completions` endpoint, for example with curl:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llava",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
```
### Setup
To setup the LLaVa models, follow the full example in the [configuration examples](https://github.com/mudler/LocalAI/blob/master/examples/configurations/README.md#llava).

View file

@ -0,0 +1,352 @@
+++
disableToc = false
title = "🎨 Image generation"
weight = 12
+++
![anime_girl](https://github.com/go-skynet/LocalAI/assets/2420543/8aaca62a-e864-4011-98ae-dcc708103928)
(Generated with [AnimagineXL](https://huggingface.co/Linaqruf/animagine-xl))
LocalAI supports generating images with Stable diffusion, running on CPU using C++ and Python implementations.
## Usage
OpenAI docs: https://platform.openai.com/docs/api-reference/images/create
To generate an image you can send a POST request to the `/v1/images/generations` endpoint with the instruction as the request body:
```bash
# 512x512 is supported too
curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
"prompt": "A cute baby sea otter",
"size": "256x256"
}'
```
Available additional parameters: `mode`, `step`.
Note: To set a negative prompt, you can split the prompt with `|`, for instance: `a cute baby sea otter|malformed`.
```bash
curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
"prompt": "floating hair, portrait, ((loli)), ((one girl)), cute face, hidden hands, asymmetrical bangs, beautiful detailed eyes, eye shadow, hair ornament, ribbons, bowties, buttons, pleated skirt, (((masterpiece))), ((best quality)), colorful|((part of the head)), ((((mutated hands and fingers)))), deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, Octane renderer, lowres, bad anatomy, bad hands, text",
"size": "256x256"
}'
```
## Backends
### stablediffusion-cpp
| mode=0 | mode=1 (winograd/sgemm) |
|------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| ![test](https://github.com/go-skynet/LocalAI/assets/2420543/7145bdee-4134-45bb-84d4-f11cb08a5638) | ![b643343452981](https://github.com/go-skynet/LocalAI/assets/2420543/abf14de1-4f50-4715-aaa4-411d703a942a) |
| ![b6441997879](https://github.com/go-skynet/LocalAI/assets/2420543/d50af51c-51b7-4f39-b6c2-bf04c403894c) | ![winograd2](https://github.com/go-skynet/LocalAI/assets/2420543/1935a69a-ecce-4afc-a099-1ac28cb649b3) |
| ![winograd](https://github.com/go-skynet/LocalAI/assets/2420543/1979a8c4-a70d-4602-95ed-642f382f6c6a) | ![winograd3](https://github.com/go-skynet/LocalAI/assets/2420543/e6d184d4-5002-408f-b564-163986e1bdfb) |
Note: image generator supports images up to 512x512. You can use other tools however to upscale the image, for instance: https://github.com/upscayl/upscayl.
#### Setup
Note: In order to use the `images/generation` endpoint with the `stablediffusion` C++ backend, you need to build LocalAI with `GO_TAGS=stablediffusion`. If you are using the container images, it is already enabled.
{{< tabs >}}
{{% tab name="Prepare the model in runtime" %}}
While the API is running, you can install the model by using the `/models/apply` endpoint and point it to the `stablediffusion` model in the [models-gallery](https://github.com/go-skynet/model-gallery#image-generation-stable-diffusion):
```bash
curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/stablediffusion.yaml"
}'
```
{{% /tab %}}
{{% tab name="Automatically prepare the model before start" %}}
You can set the `PRELOAD_MODELS` environment variable:
```bash
PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
```
or as arg:
```bash
local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
```
or in a YAML file:
```bash
local-ai --preload-models-config "/path/to/yaml"
```
YAML:
```yaml
- url: github:go-skynet/model-gallery/stablediffusion.yaml
```
{{% /tab %}}
{{% tab name="Install manually" %}}
1. Create a model file `stablediffusion.yaml` in the models folder:
```yaml
name: stablediffusion
backend: stablediffusion
parameters:
model: stablediffusion_assets
```
2. Create a `stablediffusion_assets` directory inside your `models` directory
3. Download the ncnn assets from https://github.com/EdVince/Stable-Diffusion-NCNN#out-of-box and place them in `stablediffusion_assets`.
The models directory should look like the following:
```bash
models
├── stablediffusion_assets
│   ├── AutoencoderKL-256-256-fp16-opt.param
│   ├── AutoencoderKL-512-512-fp16-opt.param
│   ├── AutoencoderKL-base-fp16.param
│   ├── AutoencoderKL-encoder-512-512-fp16.bin
│   ├── AutoencoderKL-fp16.bin
│   ├── FrozenCLIPEmbedder-fp16.bin
│   ├── FrozenCLIPEmbedder-fp16.param
│   ├── log_sigmas.bin
│   ├── tmp-AutoencoderKL-encoder-256-256-fp16.param
│   ├── UNetModel-256-256-MHA-fp16-opt.param
│   ├── UNetModel-512-512-MHA-fp16-opt.param
│   ├── UNetModel-base-MHA-fp16.param
│   ├── UNetModel-MHA-fp16.bin
│   └── vocab.txt
└── stablediffusion.yaml
```
{{% /tab %}}
{{< /tabs >}}
### Diffusers
[Diffusers](https://huggingface.co/docs/diffusers/index) is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. LocalAI has a diffusers backend which allows image generation using the `diffusers` library.
![anime_girl](https://github.com/go-skynet/LocalAI/assets/2420543/8aaca62a-e864-4011-98ae-dcc708103928)
(Generated with [AnimagineXL](https://huggingface.co/Linaqruf/animagine-xl))
#### Model setup
The models will be downloaded the first time you use the backend from `huggingface` automatically.
Create a model configuration file in the `models` directory, for instance to use `Linaqruf/animagine-xl` with CPU:
```yaml
name: animagine-xl
parameters:
model: Linaqruf/animagine-xl
backend: diffusers
# Force CPU usage - set to true for GPU
f16: false
diffusers:
cuda: false # Enable for GPU usage (CUDA)
scheduler_type: euler_a
```
#### Dependencies
This is an extra backend - in the container is already available and there is nothing to do for the setup. Do not use *core* images (ending with `-core`). If you are building manually, see the [build instructions]({{%relref "docs/getting-started/build" %}}).
#### Model setup
The models will be downloaded the first time you use the backend from `huggingface` automatically.
Create a model configuration file in the `models` directory, for instance to use `Linaqruf/animagine-xl` with CPU:
```yaml
name: animagine-xl
parameters:
model: Linaqruf/animagine-xl
backend: diffusers
cuda: true
f16: true
diffusers:
scheduler_type: euler_a
```
#### Local models
You can also use local models, or modify some parameters like `clip_skip`, `scheduler_type`, for instance:
```yaml
name: stablediffusion
parameters:
model: toonyou_beta6.safetensors
backend: diffusers
step: 30
f16: true
cuda: true
diffusers:
pipeline_type: StableDiffusionPipeline
enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
scheduler_type: "k_dpmpp_sde"
cfg_scale: 8
clip_skip: 11
```
#### Configuration parameters
The following parameters are available in the configuration file:
| Parameter | Description | Default |
| --- | --- | --- |
| `f16` | Force the usage of `float16` instead of `float32` | `false` |
| `step` | Number of steps to run the model for | `30` |
| `cuda` | Enable CUDA acceleration | `false` |
| `enable_parameters` | Parameters to enable for the model | `negative_prompt,num_inference_steps,clip_skip` |
| `scheduler_type` | Scheduler type | `k_dpp_sde` |
| `cfg_scale` | Configuration scale | `8` |
| `clip_skip` | Clip skip | None |
| `pipeline_type` | Pipeline type | `AutoPipelineForText2Image` |
There are available several types of schedulers:
| Scheduler | Description |
| --- | --- |
| `ddim` | DDIM |
| `pndm` | PNDM |
| `heun` | Heun |
| `unipc` | UniPC |
| `euler` | Euler |
| `euler_a` | Euler a |
| `lms` | LMS |
| `k_lms` | LMS Karras |
| `dpm_2` | DPM2 |
| `k_dpm_2` | DPM2 Karras |
| `dpm_2_a` | DPM2 a |
| `k_dpm_2_a` | DPM2 a Karras |
| `dpmpp_2m` | DPM++ 2M |
| `k_dpmpp_2m` | DPM++ 2M Karras |
| `dpmpp_sde` | DPM++ SDE |
| `k_dpmpp_sde` | DPM++ SDE Karras |
| `dpmpp_2m_sde` | DPM++ 2M SDE |
| `k_dpmpp_2m_sde` | DPM++ 2M SDE Karras |
Pipelines types available:
| Pipeline type | Description |
| --- | --- |
| `StableDiffusionPipeline` | Stable diffusion pipeline |
| `StableDiffusionImg2ImgPipeline` | Stable diffusion image to image pipeline |
| `StableDiffusionDepth2ImgPipeline` | Stable diffusion depth to image pipeline |
| `DiffusionPipeline` | Diffusion pipeline |
| `StableDiffusionXLPipeline` | Stable diffusion XL pipeline |
#### Usage
#### Text to Image
Use the `image` generation endpoint with the `model` name from the configuration file:
```bash
curl http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "<positive prompt>|<negative prompt>",
"model": "animagine-xl",
"step": 51,
"size": "1024x1024"
}'
```
#### Image to Image
https://huggingface.co/docs/diffusers/using-diffusers/img2img
An example model (GPU):
```yaml
name: stablediffusion-edit
parameters:
model: nitrosocke/Ghibli-Diffusion
backend: diffusers
step: 25
cuda: true
f16: true
diffusers:
pipeline_type: StableDiffusionImg2ImgPipeline
enable_parameters: "negative_prompt,num_inference_steps,image"
```
```bash
IMAGE_PATH=/path/to/your/image
(echo -n '{"file": "'; base64 $IMAGE_PATH; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-edit"}') |
curl -H "Content-Type: application/json" -d @- http://localhost:8080/v1/images/generations
```
#### Depth to Image
https://huggingface.co/docs/diffusers/using-diffusers/depth2img
```yaml
name: stablediffusion-depth
parameters:
model: stabilityai/stable-diffusion-2-depth
backend: diffusers
step: 50
# Force CPU usage
f16: true
cuda: true
diffusers:
pipeline_type: StableDiffusionDepth2ImgPipeline
enable_parameters: "negative_prompt,num_inference_steps,image"
cfg_scale: 6
```
```bash
(echo -n '{"file": "'; base64 ~/path/to/image.jpeg; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-depth"}') |
curl -H "Content-Type: application/json" -d @- http://localhost:8080/v1/images/generations
```
#### img2vid
```yaml
name: img2vid
parameters:
model: stabilityai/stable-video-diffusion-img2vid
backend: diffusers
step: 25
# Force CPU usage
f16: true
cuda: true
diffusers:
pipeline_type: StableVideoDiffusionPipeline
```
```bash
(echo -n '{"file": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true","size": "512x512","model":"img2vid"}') |
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations
```
#### txt2vid
```yaml
name: txt2vid
parameters:
model: damo-vilab/text-to-video-ms-1.7b
backend: diffusers
step: 25
# Force CPU usage
f16: true
cuda: true
diffusers:
pipeline_type: VideoDiffusionPipeline
cuda: true
```
```bash
(echo -n '{"prompt": "spiderman surfing","size": "512x512","model":"txt2vid"}') |
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations
```

View file

@ -0,0 +1,508 @@
+++
disableToc = false
title = "🖼️ Model gallery"
weight = 18
url = '/models'
+++
<h1 align="center">
<br>
<img height="300" src="https://github.com/go-skynet/model-gallery/assets/2420543/7a6a8183-6d0a-4dc4-8e1d-f2672fab354e"> <br>
<br>
</h1>
The model gallery is a (experimental!) collection of models configurations for [LocalAI](https://github.com/go-skynet/LocalAI).
LocalAI to ease out installations of models provide a way to preload models on start and downloading and installing them in runtime. You can install models manually by copying them over the `models` directory, or use the API to configure, download and verify the model assets for you. As the UI is still a work in progress, you will find here the documentation about the API Endpoints.
{{% alert note %}}
The models in this gallery are not directly maintained by LocalAI. If you find a model that is not working, please open an issue on the model gallery repository.
{{% /alert %}}
{{% alert note %}}
GPT and text generation models might have a license which is not permissive for commercial use or might be questionable or without any license at all. Please check the model license before using it. The official gallery contains only open licensed models.
{{% /alert %}}
## Useful Links and resources
- [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) - here you can find a list of the most performing models on the Open LLM benchmark. Keep in mind models compatible with LocalAI must be quantized in the `gguf` format.
## Model repositories
You can install a model in runtime, while the API is running and it is started already, or before starting the API by preloading the models.
To install a model in runtime you will need to use the `/models/apply` LocalAI API endpoint.
To enable the `model-gallery` repository you need to start `local-ai` with the `GALLERIES` environment variable:
```
GALLERIES=[{"name":"<GALLERY_NAME>", "url":"<GALLERY_URL"}]
```
For example, to enable the `model-gallery` repository, start `local-ai` with:
```
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}]
```
where `github:go-skynet/model-gallery/index.yaml` will be expanded automatically to `https://raw.githubusercontent.com/go-skynet/model-gallery/main/index.yaml`.
{{% alert note %}}
As this feature is experimental, you need to run `local-ai` with a list of `GALLERIES`. Currently there are two galleries:
- An official one, containing only definitions and models with a clear LICENSE to avoid any dmca infringment. As I'm not sure what's the best action to do in this case, I'm not going to include any model that is not clearly licensed in this repository which is offically linked to LocalAI.
- A "community" one that contains an index of `huggingface` models that are compatible with the `ggml` format and lives in the `localai-huggingface-zoo` repository.
To enable the two repositories, start `LocalAI` with the `GALLERIES` environment variable:
```bash
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]
```
If running with `docker-compose`, simply edit the `.env` file and uncomment the `GALLERIES` variable, and add the one you want to use.
{{% /alert %}}
{{% alert note %}}
You might not find all the models in this gallery. Automated CI updates the gallery automatically. You can find however most of the models on huggingface (https://huggingface.co/), generally it should be available `~24h` after upload.
By under any circumstances LocalAI and any developer is not responsible for the models in this gallery, as CI is just indexing them and providing a convenient way to install with an automatic configuration with a consistent API. Don't install models from authors you don't trust, and, check the appropriate license for your use case. Models are automatically indexed and hosted on huggingface (https://huggingface.co/). For any issue with the models, please open an issue on the model gallery repository if it's a LocalAI misconfiguration, otherwise refer to the huggingface repository. If you think a model should not be listed, please reach to us and we will remove it from the gallery.
{{% /alert %}}
{{% alert note %}}
There is no documentation yet on how to build a gallery or a repository - but you can find an example in the [model-gallery](https://github.com/go-skynet/model-gallery) repository.
{{% /alert %}}
### List Models
To list all the available models, use the `/models/available` endpoint:
```bash
curl http://localhost:8080/models/available
```
To search for a model, you can use `jq`:
```bash
# Get all information about models with a name that contains "replit"
curl http://localhost:8080/models/available | jq '.[] | select(.name | contains("replit"))'
# Get the binary name of all local models (not hosted on Hugging Face)
curl http://localhost:8080/models/available | jq '.[] | .name | select(contains("localmodels"))'
# Get all of the model URLs that contains "orca"
curl http://localhost:8080/models/available | jq '.[] | .urls | select(. != null) | add | select(contains("orca"))'
```
### How to install a model from the repositories
Models can be installed by passing the full URL of the YAML config file, or either an identifier of the model in the gallery. The gallery is a repository of models that can be installed by passing the model name.
To install a model from the gallery repository, you can pass the model name in the `id` field. For instance, to install the `bert-embeddings` model, you can use the following command:
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"id": "model-gallery@bert-embeddings"
}'
```
where:
- `model-gallery` is the repository. It is optional and can be omitted. If the repository is omitted LocalAI will search the model by name in all the repositories. In the case the same model name is present in both galleries the first match wins.
- `bert-embeddings` is the model name in the gallery
(read its [config here](https://github.com/go-skynet/model-gallery/blob/main/bert-embeddings.yaml)).
{{% alert note %}}
If the `huggingface` model gallery is enabled (it's enabled by default),
and the model has an entry in the model gallery's associated YAML config
(for `huggingface`, see [`model-gallery/huggingface.yaml`](https://github.com/go-skynet/model-gallery/blob/main/huggingface.yaml)),
you can install models by specifying directly the model's `id`.
For example, to install wizardlm superhot:
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"id": "huggingface@TheBloke/WizardLM-13B-V1-0-Uncensored-SuperHOT-8K-GGML/wizardlm-13b-v1.0-superhot-8k.ggmlv3.q4_K_M.bin"
}'
```
Note that the `id` can be used similarly when pre-loading models at start.
{{% /alert %}}
## How to install a model (without a gallery)
If you don't want to set any gallery repository, you can still install models by loading a model configuration file.
In the body of the request you must specify the model configuration file URL (`url`), optionally a name to install the model (`name`), extra files to install (`files`), and configuration overrides (`overrides`). When calling the API endpoint, LocalAI will download the models files and write the configuration to the folder used to store models.
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "<MODEL_CONFIG_FILE>"
}'
# or if from a repository
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"id": "<GALLERY>@<MODEL_NAME>"
}'
```
An example that installs openllama can be:
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "https://github.com/go-skynet/model-gallery/blob/main/openllama_3b.yaml"
}'
```
The API will return a job `uuid` that you can use to track the job progress:
```
{"uuid":"1059474d-f4f9-11ed-8d99-c4cbe106d571","status":"http://localhost:8080/models/jobs/1059474d-f4f9-11ed-8d99-c4cbe106d571"}
```
For instance, a small example bash script that waits a job to complete can be (requires `jq`):
```bash
response=$(curl -s http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{"url": "$model_url"}')
job_id=$(echo "$response" | jq -r '.uuid')
while [ "$(curl -s http://localhost:8080/models/jobs/"$job_id" | jq -r '.processed')" != "true" ]; do
sleep 1
done
echo "Job completed"
```
To preload models on start instead you can use the `PRELOAD_MODELS` environment variable.
<details>
To preload models on start, use the `PRELOAD_MODELS` environment variable by setting it to a JSON array of model uri:
```bash
PRELOAD_MODELS='[{"url": "<MODEL_URL>"}]'
```
Note: `url` or `id` must be specified. `url` is used to a url to a model gallery configuration, while an `id` is used to refer to models inside repositories. If both are specified, the `id` will be used.
For example:
```bash
PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
```
or as arg:
```bash
local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
```
or in a YAML file:
```bash
local-ai --preload-models-config "/path/to/yaml"
```
YAML:
```yaml
- url: github:go-skynet/model-gallery/stablediffusion.yaml
```
</details>
{{% alert note %}}
You can find already some open licensed models in the [model gallery](https://github.com/go-skynet/model-gallery).
If you don't find the model in the gallery you can try to use the "base" model and provide an URL to LocalAI:
<details>
```
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/base.yaml",
"name": "model-name",
"files": [
{
"uri": "<URL>",
"sha256": "<SHA>",
"filename": "model"
}
]
}'
```
</details>
{{% /alert %}}
## Installing a model with a different name
To install a model with a different name, specify a `name` parameter in the request body.
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "<MODEL_CONFIG_FILE>",
"name": "<MODEL_NAME>"
}'
```
For example, to install a model as `gpt-3.5-turbo`:
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/gpt4all-j.yaml",
"name": "gpt-3.5-turbo"
}'
```
## Additional Files
<details>
To download additional files with the model, use the `files` parameter:
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "<MODEL_CONFIG_FILE>",
"name": "<MODEL_NAME>",
"files": [
{
"uri": "<additional_file_url>",
"sha256": "<additional_file_hash>",
"filename": "<additional_file_name>"
}
]
}'
```
</details>
## Overriding configuration files
<details>
To override portions of the configuration file, such as the backend or the model file, use the `overrides` parameter:
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "<MODEL_CONFIG_FILE>",
"name": "<MODEL_NAME>",
"overrides": {
"backend": "llama",
"f16": true,
...
}
}'
```
</details>
## Examples
### Embeddings: Bert
<details>
```bash
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/bert-embeddings.yaml",
"name": "text-embedding-ada-002"
}'
```
To test it:
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/v1/embeddings -H "Content-Type: application/json" -d '{
"input": "Test",
"model": "text-embedding-ada-002"
}'
```
</details>
### Image generation: Stable diffusion
URL: https://github.com/EdVince/Stable-Diffusion-NCNN
{{< tabs >}}
{{% tab name="Prepare the model in runtime" %}}
While the API is running, you can install the model by using the `/models/apply` endpoint and point it to the `stablediffusion` model in the [models-gallery](https://github.com/go-skynet/model-gallery#image-generation-stable-diffusion):
```bash
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/stablediffusion.yaml"
}'
```
{{% /tab %}}
{{% tab name="Automatically prepare the model before start" %}}
You can set the `PRELOAD_MODELS` environment variable:
```bash
PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
```
or as arg:
```bash
local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
```
or in a YAML file:
```bash
local-ai --preload-models-config "/path/to/yaml"
```
YAML:
```yaml
- url: github:go-skynet/model-gallery/stablediffusion.yaml
```
{{% /tab %}}
{{< /tabs >}}
Test it:
```
curl $LOCALAI/v1/images/generations -H "Content-Type: application/json" -d '{
"prompt": "floating hair, portrait, ((loli)), ((one girl)), cute face, hidden hands, asymmetrical bangs, beautiful detailed eyes, eye shadow, hair ornament, ribbons, bowties, buttons, pleated skirt, (((masterpiece))), ((best quality)), colorful|((part of the head)), ((((mutated hands and fingers)))), deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, Octane renderer, lowres, bad anatomy, bad hands, text",
"mode": 2, "seed":9000,
"size": "256x256", "n":2
}'
```
### Audio transcription: Whisper
URL: https://github.com/ggerganov/whisper.cpp
{{< tabs >}}
{{% tab name="Prepare the model in runtime" %}}
```bash
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/whisper-base.yaml",
"name": "whisper-1"
}'
```
{{% /tab %}}
{{% tab name="Automatically prepare the model before start" %}}
You can set the `PRELOAD_MODELS` environment variable:
```bash
PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/whisper-base.yaml", "name": "whisper-1"}]
```
or as arg:
```bash
local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/whisper-base.yaml", "name": "whisper-1"}]'
```
or in a YAML file:
```bash
local-ai --preload-models-config "/path/to/yaml"
```
YAML:
```yaml
- url: github:go-skynet/model-gallery/whisper-base.yaml
name: whisper-1
```
{{% /tab %}}
{{< /tabs >}}
### GPTs
<details>
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/gpt4all-j.yaml",
"name": "gpt4all-j"
}'
```
To test it:
```
curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt4all-j",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.1
}'
```
</details>
### Note
LocalAI will create a batch process that downloads the required files from a model definition and automatically reload itself to include the new model.
Input: `url` or `id` (required), `name` (optional), `files` (optional)
```bash
curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
"url": "<MODEL_DEFINITION_URL>",
"id": "<GALLERY>@<MODEL_NAME>",
"name": "<INSTALLED_MODEL_NAME>",
"files": [
{
"uri": "<additional_file>",
"sha256": "<additional_file_hash>",
"filename": "<additional_file_name>"
},
"overrides": { "backend": "...", "f16": true }
]
}
```
An optional, list of additional files can be specified to be downloaded within `files`. The `name` allows to override the model name. Finally it is possible to override the model config file with `override`.
The `url` is a full URL, or a github url (`github:org/repo/file.yaml`), or a local file (`file:///path/to/file.yaml`).
The `id` is a string in the form `<GALLERY>@<MODEL_NAME>`, where `<GALLERY>` is the name of the gallery, and `<MODEL_NAME>` is the name of the model in the gallery. Galleries can be specified during startup with the `GALLERIES` environment variable.
Returns an `uuid` and an `url` to follow up the state of the process:
```json
{ "uuid":"251475c9-f666-11ed-95e0-9a8a4480ac58", "status":"http://localhost:8080/models/jobs/251475c9-f666-11ed-95e0-9a8a4480ac58"}
```
To see a collection example of curated models definition files, see the [model-gallery](https://github.com/go-skynet/model-gallery).
#### Get model job state `/models/jobs/<uid>`
This endpoint returns the state of the batch job associated to a model installation.
```bash
curl http://localhost:8080/models/jobs/<JOB_ID>
```
Returns a json containing the error, and if the job is being processed:
```json
{"error":null,"processed":true,"message":"completed"}
```

View file

@ -0,0 +1,126 @@
+++
disableToc = false
title = "🔥 OpenAI functions"
weight = 17
+++
LocalAI supports running OpenAI functions with `llama.cpp` compatible models.
![localai-functions-1](https://github.com/ggerganov/llama.cpp/assets/2420543/5bd15da2-78c1-4625-be90-1e938e6823f1)
To learn more about OpenAI functions, see the [OpenAI API blog post](https://openai.com/blog/function-calling-and-other-api-updates).
💡 Check out also [LocalAGI](https://github.com/mudler/LocalAGI) for an example on how to use LocalAI functions.
## Setup
OpenAI functions are available only with `ggml` or `gguf` models compatible with `llama.cpp`.
You don't need to do anything specific - just use `ggml` or `gguf` models.
## Usage example
You can configure a model manually with a YAML config file in the models directory, for example:
```yaml
name: gpt-3.5-turbo
parameters:
# Model file name
model: ggml-openllama.bin
top_p: 80
top_k: 0.9
temperature: 0.1
```
To use the functions with the OpenAI client in python:
```python
import openai
# ...
# Send the conversation and available functions to GPT
messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
}
]
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messages,
functions=functions,
function_call="auto",
)
# ...
```
{{% alert note %}}
When running the python script, be sure to:
- Set `OPENAI_API_KEY` environment variable to a random string (the OpenAI api key is NOT required!)
- Set `OPENAI_API_BASE` to point to your LocalAI service, for example `OPENAI_API_BASE=http://localhost:8080`
{{% /alert %}}
## Advanced
It is possible to also specify the full function signature (for debugging, or to use with other clients).
The chat endpoint accepts the `grammar_json_functions` additional parameter which takes a JSON schema object.
For example, with curl:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.1,
"grammar_json_functions": {
"oneOf": [
{
"type": "object",
"properties": {
"function": {"const": "create_event"},
"arguments": {
"type": "object",
"properties": {
"title": {"type": "string"},
"date": {"type": "string"},
"time": {"type": "string"}
}
}
}
},
{
"type": "object",
"properties": {
"function": {"const": "search"},
"arguments": {
"type": "object",
"properties": {
"query": {"type": "string"}
}
}
}
}
]
}
}'
```
## 💡 Examples
A full e2e example with `docker-compose` is available [here](https://github.com/go-skynet/LocalAI/tree/master/examples/functions).

View file

@ -0,0 +1,263 @@
+++
disableToc = false
title = "📖 Text generation (GPT)"
weight = 10
+++
LocalAI supports generating text with GPT with `llama.cpp` and other backends (such as `rwkv.cpp` as ) see also the [Model compatibility]({{%relref "docs/reference/compatibility-table" %}}) for an up-to-date list of the supported model families.
Note:
- You can also specify the model name as part of the OpenAI token.
- If only one model is available, the API will use it for all the requests.
## API Reference
### Chat completions
https://platform.openai.com/docs/api-reference/chat
For example, to generate a chat completion, you can send a POST request to the `/v1/chat/completions` endpoint with the instruction as the request body:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "ggml-koala-7b-model-q4_0-r2.bin",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
```
Available additional parameters: `top_p`, `top_k`, `max_tokens`
### Edit completions
https://platform.openai.com/docs/api-reference/edits
To generate an edit completion you can send a POST request to the `/v1/edits` endpoint with the instruction as the request body:
```bash
curl http://localhost:8080/v1/edits -H "Content-Type: application/json" -d '{
"model": "ggml-koala-7b-model-q4_0-r2.bin",
"instruction": "rephrase",
"input": "Black cat jumped out of the window",
"temperature": 0.7
}'
```
Available additional parameters: `top_p`, `top_k`, `max_tokens`.
### Completions
https://platform.openai.com/docs/api-reference/completions
To generate a completion, you can send a POST request to the `/v1/completions` endpoint with the instruction as per the request body:
```bash
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "ggml-koala-7b-model-q4_0-r2.bin",
"prompt": "A long time ago in a galaxy far, far away",
"temperature": 0.7
}'
```
Available additional parameters: `top_p`, `top_k`, `max_tokens`
### List models
You can list all the models available with:
```bash
curl http://localhost:8080/v1/models
```
## Backends
### AutoGPTQ
[AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) is an easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
#### Prerequisites
This is an extra backend - in the container images is already available and there is nothing to do for the setup.
If you are building LocalAI locally, you need to install [AutoGPTQ manually](https://github.com/PanQiWei/AutoGPTQ#quick-installation).
#### Model setup
The models are automatically downloaded from `huggingface` if not present the first time. It is possible to define models via `YAML` config file, or just by querying the endpoint with the `huggingface` repository model name. For example, create a `YAML` config file in `models/`:
```
name: orca
backend: autogptq
model_base_name: "orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order"
parameters:
model: "TheBloke/orca_mini_v2_13b-GPTQ"
# ...
```
Test with:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "orca",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.1
}'
```
### RWKV
A full example on how to run a rwkv model is in the [examples](https://github.com/go-skynet/LocalAI/tree/master/examples/rwkv).
Note: rwkv models needs to specify the backend `rwkv` in the YAML config files and have an associated tokenizer along that needs to be provided with it:
```
36464540 -rw-r--r-- 1 mudler mudler 1.2G May 3 10:51 rwkv_small
36464543 -rw-r--r-- 1 mudler mudler 2.4M May 3 10:51 rwkv_small.tokenizer.json
```
### llama.cpp
[llama.cpp](https://github.com/ggerganov/llama.cpp) is a popular port of Facebook's LLaMA model in C/C++.
{{% alert note %}}
The `ggml` file format has been deprecated. If you are using `ggml` models and you are configuring your model with a YAML file, specify, use the `llama-ggml` backend instead. If you are relying in automatic detection of the model, you should be fine. For `gguf` models, use the `llama` backend. The go backend is deprecated as well but still available as `go-llama`. The go backend supports still features not available in the mainline: speculative sampling and embeddings.
{{% /alert %}}
#### Features
The `llama.cpp` model supports the following features:
- [📖 Text generation (GPT)]({{%relref "docs/features/text-generation" %}})
- [🧠 Embeddings]({{%relref "docs/features/embeddings" %}})
- [🔥 OpenAI functions]({{%relref "docs/features/openai-functions" %}})
- [✍️ Constrained grammars]({{%relref "docs/features/constrained_grammars" %}})
#### Setup
LocalAI supports `llama.cpp` models out of the box. You can use the `llama.cpp` model in the same way as any other model.
##### Manual setup
It is sufficient to copy the `ggml` or `gguf` model files in the `models` folder. You can refer to the model in the `model` parameter in the API calls.
[You can optionally create an associated YAML]({{%relref "docs/advanced" %}}) model config file to tune the model's parameters or apply a template to the prompt.
Prompt templates are useful for models that are fine-tuned towards a specific prompt.
##### Automatic setup
LocalAI supports model galleries which are indexes of models. For instance, the huggingface gallery contains a large curated index of models from the huggingface model hub for `ggml` or `gguf` models.
For instance, if you have the galleries enabled and LocalAI already running, you can just start chatting with models in huggingface by running:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "TheBloke/WizardLM-13B-V1.2-GGML/wizardlm-13b-v1.2.ggmlv3.q2_K.bin",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.1
}'
```
LocalAI will automatically download and configure the model in the `model` directory.
Models can be also preloaded or downloaded on demand. To learn about model galleries, check out the [model gallery documentation]({{%relref "docs/features/model-gallery" %}}).
#### YAML configuration
To use the `llama.cpp` backend, specify `llama` as the backend in the YAML file:
```yaml
name: llama
backend: llama
parameters:
# Relative to the models path
model: file.gguf.bin
```
In the example above we specify `llama` as the backend to restrict loading `gguf` models only.
For instance, to use the `llama-ggml` backend for `ggml` models:
```yaml
name: llama
backend: llama-ggml
parameters:
# Relative to the models path
model: file.ggml.bin
```
#### Reference
- [llama](https://github.com/ggerganov/llama.cpp)
- [binding](https://github.com/go-skynet/go-llama.cpp)
### exllama/2
[Exllama](https://github.com/turboderp/exllama) is a "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights". Both `exllama` and `exllama2` are supported.
#### Model setup
Download the model as a folder inside the `model ` directory and create a YAML file specifying the `exllama` backend. For instance with the `TheBloke/WizardLM-7B-uncensored-GPTQ` model:
```
$ git lfs install
$ cd models && git clone https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ
$ ls models/
.keep WizardLM-7B-uncensored-GPTQ/ exllama.yaml
$ cat models/exllama.yaml
name: exllama
parameters:
model: WizardLM-7B-uncensored-GPTQ
backend: exllama
# Note: you can also specify "exllama2" if it's an exllama2 model here
# ...
```
Test with:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "exllama",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.1
}'
```
### vLLM
[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference.
LocalAI has a built-in integration with vLLM, and it can be used to run models. You can check out `vllm` performance [here](https://github.com/vllm-project/vllm#performance).
#### Setup
Create a YAML file for the model you want to use with `vllm`.
To setup a model, you need to just specify the model name in the YAML config file:
```yaml
name: vllm
backend: vllm
parameters:
model: "facebook/opt-125m"
# Decomment to specify a quantization method (optional)
# quantization: "awq"
```
The backend will automatically download the required files in order to run the model.
#### Usage
Use the `completions` endpoint by specifying the `vllm` backend:
```
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "vllm",
"prompt": "Hello, my name is",
"temperature": 0.1, "top_p": 0.1
}'
```

View file

@ -0,0 +1,158 @@
+++
disableToc = false
title = "🗣 Text to audio (TTS)"
weight = 11
+++
The `/tts` endpoint can be used to generate speech from text.
## Usage
Input: `input`, `model`
For example, to generate an audio file, you can send a POST request to the `/tts` endpoint with the instruction as the request body:
```bash
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"input": "Hello world",
"model": "tts"
}'
```
Returns an `audio/wav` file.
## Backends
### 🐸 Coqui
Required: Don't use `LocalAI` images ending with the `-core` tag,. Python dependencies are required in order to use this backend.
Coqui works without any configuration, to test it, you can run the following curl command:
```
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"backend": "coqui",
"model": "tts_models/en/ljspeech/glow-tts",
"input":"Hello, this is a test!"
}'
```
### Bark
[Bark](https://github.com/suno-ai/bark) allows to generate audio from text prompts.
This is an extra backend - in the container is already available and there is nothing to do for the setup.
#### Model setup
There is nothing to be done for the model setup. You can already start to use bark. The models will be downloaded the first time you use the backend.
#### Usage
Use the `tts` endpoint by specifying the `bark` backend:
```
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"backend": "bark",
"input":"Hello!"
}' | aplay
```
To specify a voice from https://github.com/suno-ai/bark#-voice-presets ( https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c ), use the `model` parameter:
```
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"backend": "bark",
"input":"Hello!",
"model": "v2/en_speaker_4"
}' | aplay
```
### Piper
To install the `piper` audio models manually:
- Download Voices from https://github.com/rhasspy/piper/releases/tag/v0.0.2
- Extract the `.tar.tgz` files (.onnx,.json) inside `models`
- Run the following command to test the model is working
To use the tts endpoint, run the following command. You can specify a backend with the `backend` parameter. For example, to use the `piper` backend:
```bash
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"model":"it-riccardo_fasol-x-low.onnx",
"backend": "piper",
"input": "Ciao, sono Ettore"
}' | aplay
```
Note:
- `aplay` is a Linux command. You can use other tools to play the audio file.
- The model name is the filename with the extension.
- The model name is case sensitive.
- LocalAI must be compiled with the `GO_TAGS=tts` flag.
### Transformers-musicgen
LocalAI also has experimental support for `transformers-musicgen` for the generation of short musical compositions. Currently, this is implemented via the same requests used for text to speech:
```
curl --request POST \
--url http://localhost:8080/tts \
--header 'Content-Type: application/json' \
--data '{
"backend": "transformers-musicgen",
"model": "facebook/musicgen-medium",
"input": "Cello Rave"
}' | aplay
```
Future versions of LocalAI will expose additional control over audio generation beyond the text prompt.
### Vall-E-X
[VALL-E-X](https://github.com/Plachtaa/VALL-E-X) is an open source implementation of Microsoft's VALL-E X zero-shot TTS model.
#### Setup
The backend will automatically download the required files in order to run the model.
This is an extra backend - in the container is already available and there is nothing to do for the setup. If you are building manually, you need to install Vall-E-X manually first.
#### Usage
Use the tts endpoint by specifying the vall-e-x backend:
```
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"backend": "vall-e-x",
"input":"Hello!"
}' | aplay
```
#### Voice cloning
In order to use voice cloning capabilities you must create a `YAML` configuration file to setup a model:
```yaml
name: cloned-voice
backend: vall-e-x
parameters:
model: "cloned-voice"
vall-e:
# The path to the audio file to be cloned
# relative to the models directory
audio_path: "path-to-wav-source.wav"
```
Then you can specify the model name in the requests:
```
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"backend": "vall-e-x",
"model": "cloned-voice",
"input":"Hello!"
}' | aplay
```