docs/examples: enhancements (#1572)

* docs: re-order sections * fix references * Add mixtral-instruct, tinyllama-chat, dolphin-2.5-mixtral-8x7b * Fix link * Minor corrections * fix: models is a StringSlice, not a String Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP: switch docs theme * content * Fix GH link * enhancements * enhancements * Fixed how to link Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> * fixups * logo fix * more fixups * final touches --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> Co-authored-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com>
2025-05-21 19:15:00 +00:00 · 2024-01-18 19:41:08 +01:00 · 2024-01-18 19:41:08 +01:00 · 6ca4d38a01
commit 6ca4d38a01
parent b5c93f176a
79 changed files with 1826 additions and 3546 deletions
--- a/docs/content/docs/features/GPU-acceleration.md
+++ b/docs/content/docs/features/GPU-acceleration.md
@ -0,0 +1,109 @@
+++
+disableToc = false
+title = "⚡ GPU acceleration"
+weight = 9
+++
+
+{{% alert context="warning" %}}
+Section under construction
+{{% /alert %}}
+
+This section contains instruction on how to use LocalAI with GPU acceleration.
+
+{{% alert icon="⚡" context="warning" %}}
+For accelleration for AMD or Metal HW there are no specific container images, see the [build]({{%relref "docs/getting-started/build#Acceleration" %}})
+{{% /alert %}}
+
+### CUDA(NVIDIA) acceleration
+
+#### Requirements
+
+Requirement: nvidia-container-toolkit (installation instructions [1](https://www.server-world.info/en/note?os=Ubuntu_22.04&p=nvidia&f=2) [2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
+
+To check what CUDA version do you need, you can either run `nvidia-smi` or `nvcc --version`. 
+
+Alternatively, you can also check nvidia-smi with docker:
+
+```
+docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
+```
+
+To use CUDA, use the images with the `cublas` tag, for example.
+
+The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=tags):
+
+- CUDA `11` tags: `master-cublas-cuda11`, `v1.40.0-cublas-cuda11`, ...
+- CUDA `12` tags: `master-cublas-cuda12`, `v1.40.0-cublas-cuda12`, ...
+- CUDA `11` + FFmpeg tags: `master-cublas-cuda11-ffmpeg`, `v1.40.0-cublas-cuda11-ffmpeg`, ...
+- CUDA `12` + FFmpeg tags: `master-cublas-cuda12-ffmpeg`, `v1.40.0-cublas-cuda12-ffmpeg`, ...
+
+In addition to the commands to run LocalAI normally, you need to specify `--gpus all` to docker, for example:
+
+```bash
+docker run --rm -ti --gpus all -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -v $PWD/models:/models quay.io/go-skynet/local-ai:v1.40.0-cublas-cuda12
+```
+
+If the GPU inferencing is working, you should be able to see something like:
+
+```
+5:22PM DBG Loading model in memory from file: /models/open-llama-7b-q4_0.bin
+ggml_init_cublas: found 1 CUDA devices:
+  Device 0: Tesla T4
+llama.cpp: loading model from /models/open-llama-7b-q4_0.bin
+llama_model_load_internal: format     = ggjt v3 (latest)
+llama_model_load_internal: n_vocab    = 32000
+llama_model_load_internal: n_ctx      = 1024
+llama_model_load_internal: n_embd     = 4096
+llama_model_load_internal: n_mult     = 256
+llama_model_load_internal: n_head     = 32
+llama_model_load_internal: n_layer    = 32
+llama_model_load_internal: n_rot      = 128
+llama_model_load_internal: ftype      = 2 (mostly Q4_0)
+llama_model_load_internal: n_ff       = 11008
+llama_model_load_internal: n_parts    = 1
+llama_model_load_internal: model size = 7B
+llama_model_load_internal: ggml ctx size =    0.07 MB
+llama_model_load_internal: using CUDA for GPU acceleration
+llama_model_load_internal: mem required  = 4321.77 MB (+ 1026.00 MB per state)
+llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
+llama_model_load_internal: offloading 10 repeating layers to GPU
+llama_model_load_internal: offloaded 10/35 layers to GPU
+llama_model_load_internal: total VRAM used: 1598 MB
+...................................................................................................
+llama_init_from_file: kv self size  =  512.00 MB
+```
+
+#### Model configuration
+
+Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):
+
+```yaml
+name: my-model-name
+# Default model parameters
+parameters:
+  # Relative to the models path
+  model: llama.cpp-model.ggmlv3.q5_K_M.bin
+
+context_size: 1024
+threads: 1
+
+f16: true # enable with GPU acceleration
+gpu_layers: 22 # GPU Layers (only used when built with cublas)
+
+```
+
+For diffusers instead, it might look like this instead:
+
+```yaml
+name: stablediffusion
+parameters:
+  model: toonyou_beta6.safetensors
+backend: diffusers
+step: 30
+f16: true
+diffusers:
+  pipeline_type: StableDiffusionPipeline
+  cuda: true
+  enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
+  scheduler_type: "k_dpmpp_sde"
+```
--- a/docs/content/docs/features/_index.en.md
+++ b/docs/content/docs/features/_index.en.md
@ -0,0 +1,7 @@
+
+++
+disableToc = false
+title = "Features"
+weight = 8
+icon = "feature_search"
+++
--- a/docs/content/docs/features/audio-to-text.md
+++ b/docs/content/docs/features/audio-to-text.md
@ -0,0 +1,43 @@
+++
+disableToc = false
+title = "🔈 Audio to text"
+weight = 16
+++
+
+Audio to text models are models that can generate text from an audio file.
+
+The transcription endpoint allows to convert audio files to text. The endpoint is based on [whisper.cpp](https://github.com/ggerganov/whisper.cpp), a C++ library for audio transcription. The endpoint input supports all the audio formats supported by `ffmpeg`.
+
+## Usage
+
+Once LocalAI is started and whisper models are installed, you can use the `/v1/audio/transcriptions` API endpoint.
+
+For instance, with cURL:
+
+```bash
+curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@<FILE_PATH>" -F model="<MODEL_NAME>"
+```
+
+## Example
+
+Download one of the models from [here](https://huggingface.co/ggerganov/whisper.cpp/tree/main) in the `models` folder, and create a YAML file for your model:
+
+```yaml
+name: whisper-1
+backend: whisper
+parameters:
+  model: whisper-en
+```
+
+The transcriptions endpoint then can be tested like so:
+
+```bash
+## Get an example audio file
+wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
+
+## Send the example audio file to the transcriptions endpoint
+curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@$PWD/gb1.ogg" -F model="whisper-1"
+
+## Result
+{"text":"My fellow Americans, this day has brought terrible news and great sadness to our country.At nine o'clock this morning, Mission Control in Houston lost contact with our Space ShuttleColumbia.A short time later, debris was seen falling from the skies above Texas.The Columbia's lost.There are no survivors.One board was a crew of seven.Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain DavidBrown, Commander William McCool, Dr. Kultna Shavla, and Elon Ramon, a colonel in the IsraeliAir Force.These men and women assumed great risk in the service to all humanity.In an age when spaceflight has come to seem almost routine, it is easy to overlook thedangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere ofthe Earth.These astronauts knew the dangers, and they faced them willingly, knowing they had a highand noble purpose in life.Because of their courage and daring and idealism, we will miss them all the more.All Americans today are thinking as well of the families of these men and women who havebeen given this sudden shock and grief.You're not alone.Our entire nation agrees with you, and those you loved will always have the respect andgratitude of this country.The cause in which they died will continue.Mankind has led into the darkness beyond our world by the inspiration of discovery andthe longing to understand.Our journey into space will go on.In the skies today, we saw destruction and tragedy.As farther than we can see, there is comfort and hope.In the words of the prophet Isaiah, \"Lift your eyes and look to the heavens who createdall these, he who brings out the starry hosts one by one and calls them each by name.\"Because of his great power and mighty strength, not one of them is missing.The same creator who names the stars also knows the names of the seven souls we mourntoday.The crew of the shuttle Columbia did not return safely to Earth yet we can pray that all aresafely home.May God bless the grieving families and may God continue to bless America.[BLANK_AUDIO]"}
+```
--- a/docs/content/docs/features/constrained_grammars.md
+++ b/docs/content/docs/features/constrained_grammars.md
@ -0,0 +1,30 @@
+
+++
+disableToc = false
+title = "✍️ Constrained grammars"
+weight = 15
+++
+
+The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form).
+
+This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar.
+
+{{% alert note %}}
+This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility]({{%relref "docs/reference/compatibility-table" %}})). For details on how it works, see the upstream PRs: https://github.com/ggerganov/llama.cpp/pull/1773, https://github.com/ggerganov/llama.cpp/pull/1887
+{{% /alert %}}
+
+## Setup
+
+Follow the setup instructions from the [LocalAI functions]({{%relref "docs/features/openai-functions" %}}) page.
+
+## 💡 Usage example
+
+For example, to constrain the output to either `yes`, `no`:
+
+```bash
+curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+  "model": "gpt-4",
+  "messages": [{"role": "user", "content": "Do you like apples?"}],
+  "grammar": "root ::= (\"yes\" | \"no\")"
+}'
+```
--- a/docs/content/docs/features/embeddings.md
+++ b/docs/content/docs/features/embeddings.md
@ -0,0 +1,102 @@
+
+++
+disableToc = false
+title = "🧠 Embeddings"
+weight = 13
+++
+
+LocalAI supports generating embeddings for text or list of tokens.
+
+For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings
+
+## Model compatibility
+
+The embedding endpoint is compatible with `llama.cpp` models, `bert.cpp` models and sentence-transformers models available in huggingface.
+
+## Manual Setup
+
+Create a `YAML` config file in the `models` directory. Specify the `backend` and the model file.
+
+```yaml
+name: text-embedding-ada-002 # The model name used in the API
+parameters:
+  model: <model_file>
+backend: "<backend>"
+embeddings: true
+# .. other parameters
+```
+
+## Bert embeddings
+
+To use `bert.cpp` models you can use the `bert` embedding backend.
+
+An example model config file:
+
+```yaml
+name: text-embedding-ada-002
+parameters:
+  model: bert
+backend: bert-embeddings
+embeddings: true
+# .. other parameters
+```
+
+The `bert` backend uses [bert.cpp](https://github.com/skeskinen/bert.cpp) and uses `ggml` models.
+
+For instance you can download the `ggml` quantized version of `all-MiniLM-L6-v2` from https://huggingface.co/skeskinen/ggml:
+
+```bash
+wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
+```
+
+To test locally (LocalAI server running on `localhost`),
+you can use `curl` (and `jq` at the end to prettify):
+
+```bash
+curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
+  "input": "Your text string goes here",
+  "model": "text-embedding-ada-002"
+}' | jq "."
+```
+
+## Huggingface embeddings
+
+To use `sentence-transformers` and models in `huggingface` you can use the `sentencetransformers` embedding backend.
+
+```yaml
+name: text-embedding-ada-002
+backend: sentencetransformers
+embeddings: true
+parameters:
+  model: all-MiniLM-L6-v2
+```
+
+The `sentencetransformers` backend uses Python [sentence-transformers](https://github.com/UKPLab/sentence-transformers). For a list of all pre-trained models available see here: https://github.com/UKPLab/sentence-transformers#pre-trained-models
+
+{{% alert note %}}
+
+- The `sentencetransformers` backend is an optional backend of LocalAI and uses Python. If you are running `LocalAI` from the containers you are good to go and should be already configured for use.
+- If you are running `LocalAI` manually you must install the python dependencies (`make prepare-extra-conda-environments`). This requires `conda` to be installed.
+- For local execution, you also have to specify the extra backend in the `EXTERNAL_GRPC_BACKENDS` environment variable.
+    - Example: `EXTERNAL_GRPC_BACKENDS="sentencetransformers:/path/to/LocalAI/backend/python/sentencetransformers/sentencetransformers.py"`
+- The `sentencetransformers` backend does support only embeddings of text, and not of tokens. If you need to embed tokens you can use the `bert` backend or `llama.cpp`.
+- No models are required to be downloaded before using the `sentencetransformers` backend. The models will be downloaded automatically the first time the API is used.
+
+{{% /alert %}}
+
+## Llama.cpp embeddings
+
+Embeddings with `llama.cpp` are supported with the `llama` backend.
+
+```yaml
+name: my-awesome-model
+backend: llama
+embeddings: true
+parameters:
+  model: ggml-file.bin
+# ...
+```
+
+## 💡 Examples
+
+- Example that uses LLamaIndex and LocalAI as embedding: [here](https://github.com/go-skynet/LocalAI/tree/master/examples/query_data/).
--- a/docs/content/docs/features/gpt-vision.md
+++ b/docs/content/docs/features/gpt-vision.md
@ -0,0 +1,30 @@
+
+++
+disableToc = false
+title = "🆕 GPT Vision"
+weight = 14
+++
+
+{{% alert note %}}
+Available only on `master` builds
+{{% /alert %}}
+
+LocalAI supports understanding images by using [LLaVA](https://llava.hliu.cc/), and implements the [GPT Vision API](https://platform.openai.com/docs/guides/vision) from OpenAI.
+
+![llava](https://github.com/mudler/LocalAI/assets/2420543/cb0a0897-3b58-4350-af66-e6f4387b58d3)
+
+## Usage
+
+OpenAI docs: https://platform.openai.com/docs/guides/vision
+
+To let LocalAI understand and reply with what sees in the image, use the `/v1/chat/completions` endpoint, for example with curl:
+
+```bash
+curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+     "model": "llava",
+     "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
+```
+
+### Setup
+
+To setup the LLaVa models, follow the full example in the [configuration examples](https://github.com/mudler/LocalAI/blob/master/examples/configurations/README.md#llava).
--- a/docs/content/docs/features/image-generation.md
+++ b/docs/content/docs/features/image-generation.md
@ -0,0 +1,352 @@
+
+++
+disableToc = false
+title = "🎨 Image generation"
+weight = 12
+++
+
+![anime_girl](https://github.com/go-skynet/LocalAI/assets/2420543/8aaca62a-e864-4011-98ae-dcc708103928)
+(Generated with [AnimagineXL](https://huggingface.co/Linaqruf/animagine-xl))
+
+LocalAI supports generating images with Stable diffusion, running on CPU using C++ and Python implementations.
+
+## Usage
+
+OpenAI docs: https://platform.openai.com/docs/api-reference/images/create
+
+To generate an image you can send a POST request to the `/v1/images/generations` endpoint with the instruction as the request body:
+
+```bash
+# 512x512 is supported too
+curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
+  "prompt": "A cute baby sea otter",
+  "size": "256x256"
+}'
+```
+
+Available additional parameters: `mode`, `step`.
+
+Note: To set a negative prompt, you can split the prompt with `|`, for instance: `a cute baby sea otter|malformed`.
+
+```bash
+curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
+  "prompt": "floating hair, portrait, ((loli)), ((one girl)), cute face, hidden hands, asymmetrical bangs, beautiful detailed eyes, eye shadow, hair ornament, ribbons, bowties, buttons, pleated skirt, (((masterpiece))), ((best quality)), colorful|((part of the head)), ((((mutated hands and fingers)))), deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, Octane renderer, lowres, bad anatomy, bad hands, text",
+  "size": "256x256"
+}'
+```
+
+## Backends
+
+### stablediffusion-cpp
+
+| mode=0                                                                                                                | mode=1 (winograd/sgemm)                                                                                                                |
+|------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
+| ![test](https://github.com/go-skynet/LocalAI/assets/2420543/7145bdee-4134-45bb-84d4-f11cb08a5638)                      | ![b643343452981](https://github.com/go-skynet/LocalAI/assets/2420543/abf14de1-4f50-4715-aaa4-411d703a942a)          |
+| ![b6441997879](https://github.com/go-skynet/LocalAI/assets/2420543/d50af51c-51b7-4f39-b6c2-bf04c403894c)              | ![winograd2](https://github.com/go-skynet/LocalAI/assets/2420543/1935a69a-ecce-4afc-a099-1ac28cb649b3)                |
+| ![winograd](https://github.com/go-skynet/LocalAI/assets/2420543/1979a8c4-a70d-4602-95ed-642f382f6c6a)                | ![winograd3](https://github.com/go-skynet/LocalAI/assets/2420543/e6d184d4-5002-408f-b564-163986e1bdfb)                |
+
+Note: image generator supports images up to 512x512. You can use other tools however to upscale the image, for instance: https://github.com/upscayl/upscayl.
+
+#### Setup
+
+Note: In order to use the `images/generation` endpoint with the `stablediffusion` C++ backend, you need to build LocalAI with `GO_TAGS=stablediffusion`. If you are using the container images, it is already enabled.
+
+{{< tabs >}}
+{{% tab name="Prepare the model in runtime" %}}
+
+While the API is running, you can install the model by using the `/models/apply` endpoint and point it to the `stablediffusion` model in the [models-gallery](https://github.com/go-skynet/model-gallery#image-generation-stable-diffusion):
+
+```bash
+curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
+  "url": "github:go-skynet/model-gallery/stablediffusion.yaml"
+}'
+```
+
+{{% /tab %}}
+{{% tab name="Automatically prepare the model before start" %}}
+
+You can set the `PRELOAD_MODELS` environment variable:
+
+```bash
+PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
+```
+
+or as arg:
+
+```bash
+local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
+```
+
+or in a YAML file:
+
+```bash
+local-ai --preload-models-config "/path/to/yaml"
+```
+
+YAML:
+
+```yaml
+- url: github:go-skynet/model-gallery/stablediffusion.yaml
+```
+
+{{% /tab %}}
+{{% tab name="Install manually" %}}
+
+1. Create a model file `stablediffusion.yaml` in the models folder:
+
+```yaml
+name: stablediffusion
+backend: stablediffusion
+parameters:
+  model: stablediffusion_assets
+```
+
+2. Create a `stablediffusion_assets` directory inside your `models` directory
+3. Download the ncnn assets from https://github.com/EdVince/Stable-Diffusion-NCNN#out-of-box and place them in `stablediffusion_assets`.
+
+The models directory should look like the following:
+
+```bash
+models
+├── stablediffusion_assets
+│   ├── AutoencoderKL-256-256-fp16-opt.param
+│   ├── AutoencoderKL-512-512-fp16-opt.param
+│   ├── AutoencoderKL-base-fp16.param
+│   ├── AutoencoderKL-encoder-512-512-fp16.bin
+│   ├── AutoencoderKL-fp16.bin
+│   ├── FrozenCLIPEmbedder-fp16.bin
+│   ├── FrozenCLIPEmbedder-fp16.param
+│   ├── log_sigmas.bin
+│   ├── tmp-AutoencoderKL-encoder-256-256-fp16.param
+│   ├── UNetModel-256-256-MHA-fp16-opt.param
+│   ├── UNetModel-512-512-MHA-fp16-opt.param
+│   ├── UNetModel-base-MHA-fp16.param
+│   ├── UNetModel-MHA-fp16.bin
+│   └── vocab.txt
+└── stablediffusion.yaml
+```
+
+{{% /tab %}}
+
+{{< /tabs >}}
+
+### Diffusers
+
+[Diffusers](https://huggingface.co/docs/diffusers/index) is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. LocalAI has a diffusers backend which allows image generation using the `diffusers` library.
+
+![anime_girl](https://github.com/go-skynet/LocalAI/assets/2420543/8aaca62a-e864-4011-98ae-dcc708103928)
+(Generated with [AnimagineXL](https://huggingface.co/Linaqruf/animagine-xl))
+
+#### Model setup
+
+The models will be downloaded the first time you use the backend from `huggingface` automatically.
+
+Create a model configuration file in the `models` directory, for instance to use `Linaqruf/animagine-xl` with CPU:
+
+```yaml
+name: animagine-xl
+parameters:
+  model: Linaqruf/animagine-xl
+backend: diffusers
+
+# Force CPU usage - set to true for GPU
+f16: false
+diffusers:
+  cuda: false # Enable for GPU usage (CUDA)
+  scheduler_type: euler_a
+```
+
+#### Dependencies
+
+This is an extra backend - in the container is already available and there is nothing to do for the setup. Do not use *core* images (ending with `-core`). If you are building manually, see the [build instructions]({{%relref "docs/getting-started/build" %}}).
+
+#### Model setup
+
+The models will be downloaded the first time you use the backend from `huggingface` automatically.
+
+Create a model configuration file in the `models` directory, for instance to use `Linaqruf/animagine-xl` with CPU:
+
+```yaml
+name: animagine-xl
+parameters:
+  model: Linaqruf/animagine-xl
+backend: diffusers
+cuda: true
+f16: true
+diffusers:
+  scheduler_type: euler_a
+```
+
+#### Local models
+
+You can also use local models, or modify some parameters like `clip_skip`, `scheduler_type`, for instance:
+
+```yaml
+name: stablediffusion
+parameters:
+  model: toonyou_beta6.safetensors
+backend: diffusers
+step: 30
+f16: true
+cuda: true
+diffusers:
+  pipeline_type: StableDiffusionPipeline
+  enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
+  scheduler_type: "k_dpmpp_sde"
+  cfg_scale: 8
+  clip_skip: 11
+```
+
+#### Configuration parameters
+
+The following parameters are available in the configuration file:
+
+| Parameter | Description | Default |
+| --- | --- | --- |
+| `f16` | Force the usage of `float16` instead of `float32` | `false` |
+| `step` | Number of steps to run the model for | `30` |
+| `cuda` | Enable CUDA acceleration | `false` |
+| `enable_parameters` | Parameters to enable for the model | `negative_prompt,num_inference_steps,clip_skip` |
+| `scheduler_type` | Scheduler type | `k_dpp_sde` |
+| `cfg_scale` | Configuration scale | `8` |
+| `clip_skip` | Clip skip | None |
+| `pipeline_type` | Pipeline type | `AutoPipelineForText2Image` |
+
+There are available several types of schedulers:
+
+| Scheduler | Description |
+| --- | --- |
+| `ddim` | DDIM |
+| `pndm` | PNDM |
+| `heun` | Heun |
+| `unipc` | UniPC |
+| `euler` | Euler |
+| `euler_a` | Euler a |
+| `lms` | LMS |
+| `k_lms` | LMS Karras |
+| `dpm_2` | DPM2 |
+| `k_dpm_2` | DPM2 Karras |
+| `dpm_2_a` | DPM2 a |
+| `k_dpm_2_a` | DPM2 a Karras |
+| `dpmpp_2m` | DPM++ 2M |
+| `k_dpmpp_2m` | DPM++ 2M Karras |
+| `dpmpp_sde` | DPM++ SDE |
+| `k_dpmpp_sde` | DPM++ SDE Karras |
+| `dpmpp_2m_sde` | DPM++ 2M SDE |
+| `k_dpmpp_2m_sde` | DPM++ 2M SDE Karras |
+
+Pipelines types available:
+
+| Pipeline type | Description |
+| --- | --- |
+| `StableDiffusionPipeline` | Stable diffusion pipeline |
+| `StableDiffusionImg2ImgPipeline` | Stable diffusion image to image pipeline |
+| `StableDiffusionDepth2ImgPipeline` | Stable diffusion depth to image pipeline |
+| `DiffusionPipeline` | Diffusion pipeline |
+| `StableDiffusionXLPipeline` | Stable diffusion XL pipeline |
+
+#### Usage
+
+#### Text to Image
+Use the `image` generation endpoint with the `model` name from the configuration file:
+
+```bash
+curl http://localhost:8080/v1/images/generations \
+    -H "Content-Type: application/json" \
+    -d '{
+      "prompt": "<positive prompt>|<negative prompt>", 
+      "model": "animagine-xl", 
+      "step": 51,
+      "size": "1024x1024" 
+    }'
+```
+
+#### Image to Image
+
+https://huggingface.co/docs/diffusers/using-diffusers/img2img
+
+An example model (GPU):
+```yaml
+name: stablediffusion-edit
+parameters:
+  model: nitrosocke/Ghibli-Diffusion
+backend: diffusers
+step: 25
+cuda: true
+f16: true
+diffusers:
+  pipeline_type: StableDiffusionImg2ImgPipeline
+  enable_parameters: "negative_prompt,num_inference_steps,image"
+```
+
+```bash
+IMAGE_PATH=/path/to/your/image
+(echo -n '{"file": "'; base64 $IMAGE_PATH; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-edit"}') |
+curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations
+```
+
+#### Depth to Image
+
+https://huggingface.co/docs/diffusers/using-diffusers/depth2img
+
+```yaml
+name: stablediffusion-depth
+parameters:
+  model: stabilityai/stable-diffusion-2-depth
+backend: diffusers
+step: 50
+# Force CPU usage
+f16: true
+cuda: true
+diffusers:
+  pipeline_type: StableDiffusionDepth2ImgPipeline
+  enable_parameters: "negative_prompt,num_inference_steps,image"
+  cfg_scale: 6
+```
+
+```bash
+(echo -n '{"file": "'; base64 ~/path/to/image.jpeg; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-depth"}') |
+curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations
+```
+
+#### img2vid
+
+
+```yaml
+name: img2vid
+parameters:
+  model: stabilityai/stable-video-diffusion-img2vid
+backend: diffusers
+step: 25
+# Force CPU usage
+f16: true
+cuda: true
+diffusers:
+  pipeline_type: StableVideoDiffusionPipeline
+```
+
+```bash
+(echo -n '{"file": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true","size": "512x512","model":"img2vid"}') |
+curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations
+```
+
+#### txt2vid
+
+```yaml
+name: txt2vid
+parameters:
+  model: damo-vilab/text-to-video-ms-1.7b
+backend: diffusers
+step: 25
+# Force CPU usage
+f16: true
+cuda: true
+diffusers:
+  pipeline_type: VideoDiffusionPipeline
+  cuda: true
+```
+
+```bash
+(echo -n '{"prompt": "spiderman surfing","size": "512x512","model":"txt2vid"}') |
+curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations
+```
--- a/docs/content/docs/features/model-gallery.md
+++ b/docs/content/docs/features/model-gallery.md
@ -0,0 +1,508 @@
+
+++
+disableToc = false
+title = "🖼️ Model gallery"
+
+weight = 18
+url = '/models'
+++
+
+<h1 align="center">
+  <br>
+  <img height="300" src="https://github.com/go-skynet/model-gallery/assets/2420543/7a6a8183-6d0a-4dc4-8e1d-f2672fab354e"> <br>
+<br>
+</h1>
+
+The model gallery is a (experimental!) collection of models configurations for [LocalAI](https://github.com/go-skynet/LocalAI).
+
+LocalAI to ease out installations of models provide a way to preload models on start and downloading and installing them in runtime. You can install models manually by copying them over the `models` directory, or use the API to configure, download and verify the model assets for you. As the UI is still a work in progress, you will find here the documentation about the API Endpoints.
+
+{{% alert note %}}
+The models in this gallery are not directly maintained by LocalAI. If you find a model that is not working, please open an issue on the model gallery repository.
+{{% /alert %}}
+
+{{% alert note %}}
+GPT and text generation models might have a license which is not permissive for commercial use or might be questionable or without any license at all. Please check the model license before using it. The official gallery contains only open licensed models.
+{{% /alert %}}
+
+## Useful Links and resources
+
+- [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) - here you can find a list of the most performing models on the Open LLM benchmark. Keep in mind models compatible with LocalAI must be quantized in the `gguf` format.
+
+
+## Model repositories
+
+You can install a model in runtime, while the API is running and it is started already, or before starting the API by preloading the models.
+
+To install a model in runtime you will need to use the `/models/apply` LocalAI API endpoint.
+
+To enable the `model-gallery` repository you need to start `local-ai` with the `GALLERIES` environment variable:
+
+```
+GALLERIES=[{"name":"<GALLERY_NAME>", "url":"<GALLERY_URL"}]
+```
+
+For example, to enable the `model-gallery` repository, start `local-ai` with:
+
+```
+GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}]
+```
+
+where `github:go-skynet/model-gallery/index.yaml` will be expanded automatically to `https://raw.githubusercontent.com/go-skynet/model-gallery/main/index.yaml`.
+
+{{% alert note %}}
+
+As this feature is experimental, you need to run `local-ai` with a list of `GALLERIES`. Currently there are two galleries:
+
+- An official one, containing only definitions and models with a clear LICENSE to avoid any dmca infringment. As I'm not sure what's the best action to do in this case, I'm not going to include any model that is not clearly licensed in this repository which is offically linked to LocalAI.
+- A "community" one that contains an index of `huggingface` models that are compatible with the `ggml` format and lives in the `localai-huggingface-zoo` repository.
+
+To enable the two repositories, start `LocalAI` with the `GALLERIES` environment variable:
+
+```bash
+GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]
+```
+
+If running with `docker-compose`, simply edit the `.env` file and uncomment the `GALLERIES` variable, and add the one you want to use.
+
+{{% /alert %}}
+
+{{% alert note %}}
+You might not find all the models in this gallery. Automated CI updates the gallery automatically. You can find however most of the models on huggingface (https://huggingface.co/), generally it should be available `~24h` after upload.
+
+By under any circumstances LocalAI and any developer is not responsible for the models in this gallery, as CI is just indexing them and providing a convenient way to install with an automatic configuration with a consistent API. Don't install models from authors you don't trust, and, check the appropriate license for your use case. Models are automatically indexed and hosted on huggingface (https://huggingface.co/). For any issue with the models, please open an issue on the model gallery repository if it's a LocalAI misconfiguration, otherwise refer to the huggingface repository. If you think a model should not be listed, please reach to us and we will remove it from the gallery.
+{{% /alert %}}
+
+{{% alert note %}}
+
+There is no documentation yet on how to build a gallery or a repository - but you can find an example in the [model-gallery](https://github.com/go-skynet/model-gallery) repository.
+
+{{% /alert %}}
+
+
+### List Models
+
+To list all the available models, use the `/models/available` endpoint:
+
+```bash
+curl http://localhost:8080/models/available
+```
+
+To search for a model, you can use `jq`:
+
+```bash
+# Get all information about models with a name that contains "replit"
+curl http://localhost:8080/models/available | jq '.[] | select(.name | contains("replit"))'
+
+# Get the binary name of all local models (not hosted on Hugging Face)
+curl http://localhost:8080/models/available | jq '.[] | .name | select(contains("localmodels"))'
+
+# Get all of the model URLs that contains "orca"
+curl http://localhost:8080/models/available | jq '.[] | .urls | select(. != null) | add | select(contains("orca"))'
+```
+
+### How to install a model from the repositories
+
+Models can be installed by passing the full URL of the YAML config file, or either an identifier of the model in the gallery. The gallery is a repository of models that can be installed by passing the model name.
+
+To install a model from the gallery repository, you can pass the model name in the `id` field. For instance, to install the `bert-embeddings` model, you can use the following command:
+
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "id": "model-gallery@bert-embeddings"
+   }'  
+```
+
+where:
+- `model-gallery` is the repository. It is optional and can be omitted. If the repository is omitted LocalAI will search the model by name in all the repositories. In the case the same model name is present in both galleries the first match wins.
+- `bert-embeddings` is the model name in the gallery
+  (read its [config here](https://github.com/go-skynet/model-gallery/blob/main/bert-embeddings.yaml)).
+
+{{% alert note %}}
+If the `huggingface` model gallery is enabled (it's enabled by default),
+and the model has an entry in the model gallery's associated YAML config
+(for `huggingface`, see [`model-gallery/huggingface.yaml`](https://github.com/go-skynet/model-gallery/blob/main/huggingface.yaml)),
+you can install models by specifying directly the model's `id`.
+For example, to install wizardlm superhot:
+
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "id": "huggingface@TheBloke/WizardLM-13B-V1-0-Uncensored-SuperHOT-8K-GGML/wizardlm-13b-v1.0-superhot-8k.ggmlv3.q4_K_M.bin"
+   }'  
+```
+
+Note that the `id` can be used similarly when pre-loading models at start.
+{{% /alert %}}
+
+
+## How to install a model (without a gallery)
+
+If you don't want to set any gallery repository, you can still install models by loading a model configuration file.
+
+In the body of the request you must specify the model configuration file URL (`url`), optionally a name to install the model (`name`), extra files to install (`files`), and configuration overrides (`overrides`). When calling the API endpoint, LocalAI will download the models files and write the configuration to the folder used to store models.
+
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "url": "<MODEL_CONFIG_FILE>"
+   }' 
+# or if from a repository
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "id": "<GALLERY>@<MODEL_NAME>"
+   }' 
+```
+
+An example that installs openllama can be:
+   
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "url": "https://github.com/go-skynet/model-gallery/blob/main/openllama_3b.yaml"
+   }'  
+```
+
+The API will return a job `uuid` that you can use to track the job progress:
+```
+{"uuid":"1059474d-f4f9-11ed-8d99-c4cbe106d571","status":"http://localhost:8080/models/jobs/1059474d-f4f9-11ed-8d99-c4cbe106d571"}
+```
+
+For instance, a small example bash script that waits a job to complete can be (requires `jq`):
+
+```bash
+response=$(curl -s http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{"url": "$model_url"}')
+
+job_id=$(echo "$response" | jq -r '.uuid')
+
+while [ "$(curl -s http://localhost:8080/models/jobs/"$job_id" | jq -r '.processed')" != "true" ]; do 
+  sleep 1
+done
+
+echo "Job completed"
+```
+
+To preload models on start instead you can use the `PRELOAD_MODELS` environment variable.
+
+<details>
+
+To preload models on start, use the `PRELOAD_MODELS` environment variable by setting it to a JSON array of model uri:
+
+```bash
+PRELOAD_MODELS='[{"url": "<MODEL_URL>"}]'
+```
+
+Note: `url` or `id` must be specified. `url` is used to a url to a model gallery configuration, while an `id` is used to refer to models inside repositories. If both are specified, the `id` will be used.
+
+For example:
+
+```bash
+PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
+```
+
+or as arg:
+
+```bash
+local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
+```
+
+or in a YAML file:
+
+```bash
+local-ai --preload-models-config "/path/to/yaml"
+```
+
+YAML:
+```yaml
+- url: github:go-skynet/model-gallery/stablediffusion.yaml
+```
+
+</details>
+
+{{% alert note %}}
+
+You can find already some open licensed models in the [model gallery](https://github.com/go-skynet/model-gallery).
+
+If you don't find the model in the gallery you can try to use the "base" model and provide an URL to LocalAI:
+
+<details>
+
+```
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "url": "github:go-skynet/model-gallery/base.yaml",
+     "name": "model-name",
+     "files": [
+        {
+            "uri": "<URL>",
+            "sha256": "<SHA>",
+            "filename": "model"
+        }
+     ]
+   }'
+```
+
+</details>
+
+{{% /alert %}}
+
+## Installing a model with a different name
+
+To install a model with a different name, specify a `name` parameter in the request body.
+
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "url": "<MODEL_CONFIG_FILE>",
+     "name": "<MODEL_NAME>"
+   }'  
+```
+
+For example, to install a model as `gpt-3.5-turbo`:
+   
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+      "url": "github:go-skynet/model-gallery/gpt4all-j.yaml",
+      "name": "gpt-3.5-turbo"
+   }'  
+```
+## Additional Files
+
+<details>
+
+To download additional files with the model, use the `files` parameter:
+
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "url": "<MODEL_CONFIG_FILE>",
+     "name": "<MODEL_NAME>",
+     "files": [
+        {
+            "uri": "<additional_file_url>",
+            "sha256": "<additional_file_hash>",
+            "filename": "<additional_file_name>"
+        }
+     ]
+   }'  
+```
+
+</details>
+
+## Overriding configuration files
+
+<details>
+
+To override portions of the configuration file, such as the backend or the model file, use the `overrides` parameter:
+
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "url": "<MODEL_CONFIG_FILE>",
+     "name": "<MODEL_NAME>",
+     "overrides": {
+        "backend": "llama",
+        "f16": true,
+        ...
+     }
+   }'  
+```
+
+</details>
+
+
+
+## Examples
+
+### Embeddings: Bert
+
+<details>
+
+```bash
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "url": "github:go-skynet/model-gallery/bert-embeddings.yaml",
+     "name": "text-embedding-ada-002"
+   }'  
+```
+
+To test it:
+
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/v1/embeddings -H "Content-Type: application/json" -d '{
+    "input": "Test",
+    "model": "text-embedding-ada-002"
+  }'
+```
+
+</details>
+
+### Image generation: Stable diffusion
+
+URL: https://github.com/EdVince/Stable-Diffusion-NCNN
+
+{{< tabs >}}
+{{% tab name="Prepare the model in runtime" %}}
+
+While the API is running, you can install the model by using the `/models/apply` endpoint and point it to the `stablediffusion` model in the [models-gallery](https://github.com/go-skynet/model-gallery#image-generation-stable-diffusion):
+```bash
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{         
+     "url": "github:go-skynet/model-gallery/stablediffusion.yaml"
+   }'
+```
+
+{{% /tab %}}
+{{% tab name="Automatically prepare the model before start" %}}
+
+You can set the `PRELOAD_MODELS` environment variable:
+
+```bash
+PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
+```
+
+or as arg:
+
+```bash
+local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
+```
+
+or in a YAML file:
+
+```bash
+local-ai --preload-models-config "/path/to/yaml"
+```
+
+YAML:
+```yaml
+- url: github:go-skynet/model-gallery/stablediffusion.yaml
+```
+
+{{% /tab %}}
+{{< /tabs >}}
+
+Test it:
+
+```
+curl $LOCALAI/v1/images/generations -H "Content-Type: application/json" -d '{
+            "prompt": "floating hair, portrait, ((loli)), ((one girl)), cute face, hidden hands, asymmetrical bangs, beautiful detailed eyes, eye shadow, hair ornament, ribbons, bowties, buttons, pleated skirt, (((masterpiece))), ((best quality)), colorful|((part of the head)), ((((mutated hands and fingers)))), deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, Octane renderer, lowres, bad anatomy, bad hands, text",
+            "mode": 2,  "seed":9000,
+            "size": "256x256", "n":2
+}'
+```
+
+### Audio transcription: Whisper
+
+URL: https://github.com/ggerganov/whisper.cpp
+
+{{< tabs >}}
+{{% tab name="Prepare the model in runtime" %}}
+
+```bash
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{         
+     "url": "github:go-skynet/model-gallery/whisper-base.yaml",
+     "name": "whisper-1"
+   }'
+```
+
+{{% /tab %}}
+{{% tab name="Automatically prepare the model before start" %}}
+
+You can set the `PRELOAD_MODELS` environment variable:
+
+```bash
+PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/whisper-base.yaml", "name": "whisper-1"}]
+```
+
+or as arg:
+
+```bash
+local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/whisper-base.yaml", "name": "whisper-1"}]'
+```
+
+or in a YAML file:
+
+```bash
+local-ai --preload-models-config "/path/to/yaml"
+```
+
+YAML:
+```yaml
+- url: github:go-skynet/model-gallery/whisper-base.yaml
+  name: whisper-1
+```
+
+{{% /tab %}}
+{{< /tabs >}}
+
+### GPTs
+
+<details>
+
+```bash
+LOCALAI=http://localhost:8080
+curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
+     "url": "github:go-skynet/model-gallery/gpt4all-j.yaml",
+     "name": "gpt4all-j"
+   }'  
+```
+
+To test it:
+
+```
+curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
+     "model": "gpt4all-j", 
+     "messages": [{"role": "user", "content": "How are you?"}],
+     "temperature": 0.1 
+   }'
+```
+
+</details>
+
+### Note
+
+LocalAI will create a batch process that downloads the required files from a model definition and automatically reload itself to include the new model. 
+
+Input: `url` or `id` (required), `name` (optional), `files` (optional)
+
+```bash
+curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
+     "url": "<MODEL_DEFINITION_URL>",
+     "id": "<GALLERY>@<MODEL_NAME>",
+     "name": "<INSTALLED_MODEL_NAME>",
+     "files": [
+        {
+            "uri": "<additional_file>",
+            "sha256": "<additional_file_hash>",
+            "filename": "<additional_file_name>"
+        },
+      "overrides": { "backend": "...", "f16": true }
+     ]
+   }
+```
+
+An optional, list of additional files can be specified to be downloaded within `files`. The `name` allows to override the model name. Finally it is possible to override the model config file with `override`.
+
+The `url` is a full URL, or a github url (`github:org/repo/file.yaml`), or a local file (`file:///path/to/file.yaml`).
+The `id` is a string in the form `<GALLERY>@<MODEL_NAME>`, where `<GALLERY>` is the name of the gallery, and `<MODEL_NAME>` is the name of the model in the gallery. Galleries can be specified during startup with the `GALLERIES` environment variable.
+
+Returns an `uuid` and an `url` to follow up the state of the process:
+
+```json
+{ "uuid":"251475c9-f666-11ed-95e0-9a8a4480ac58", "status":"http://localhost:8080/models/jobs/251475c9-f666-11ed-95e0-9a8a4480ac58"}
+```
+
+To see a collection example of curated models definition files, see the [model-gallery](https://github.com/go-skynet/model-gallery).
+
+#### Get model job state `/models/jobs/<uid>`
+
+This endpoint returns the state of the batch job associated to a model installation.
+
+```bash
+curl http://localhost:8080/models/jobs/<JOB_ID>
+```
+
+Returns a json containing the error, and if the job is being processed:
+
+```json
+{"error":null,"processed":true,"message":"completed"}
+```
--- a/docs/content/docs/features/openai-functions.md
+++ b/docs/content/docs/features/openai-functions.md
@ -0,0 +1,126 @@
+
+++
+disableToc = false
+title = "🔥 OpenAI functions"
+weight = 17
+++
+
+LocalAI supports running OpenAI functions with `llama.cpp` compatible models.
+
+![localai-functions-1](https://github.com/ggerganov/llama.cpp/assets/2420543/5bd15da2-78c1-4625-be90-1e938e6823f1)
+
+To learn more about OpenAI functions, see the [OpenAI API blog post](https://openai.com/blog/function-calling-and-other-api-updates).
+
+💡 Check out also [LocalAGI](https://github.com/mudler/LocalAGI) for an example on how to use LocalAI functions.
+
+## Setup
+
+OpenAI functions are available only with `ggml` or `gguf` models compatible with `llama.cpp`.
+
+You don't need to do anything specific - just use `ggml` or `gguf` models.
+
+
+## Usage example
+
+You can configure a model manually with a YAML config file in the models directory, for example:
+
+```yaml
+name: gpt-3.5-turbo
+parameters:
+  # Model file name
+  model: ggml-openllama.bin
+  top_p: 80
+  top_k: 0.9
+  temperature: 0.1
+```
+
+To use the functions with the OpenAI client in python:
+
+```python
+import openai
+# ...
+# Send the conversation and available functions to GPT
+messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
+functions = [
+    {
+        "name": "get_current_weather",
+        "description": "Get the current weather in a given location",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {
+                    "type": "string",
+                    "description": "The city and state, e.g. San Francisco, CA",
+                },
+                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+            },
+            "required": ["location"],
+        },
+    }
+]
+response = openai.ChatCompletion.create(
+    model="gpt-3.5-turbo",
+    messages=messages,
+    functions=functions,
+    function_call="auto",
+)
+# ...
+```
+
+{{% alert note %}}
+When running the python script, be sure to:
+
+- Set `OPENAI_API_KEY` environment variable to a random string (the OpenAI api key is NOT required!)
+- Set `OPENAI_API_BASE` to point to your LocalAI service, for example `OPENAI_API_BASE=http://localhost:8080`
+
+{{% /alert %}}
+
+## Advanced
+
+It is possible to also specify the full function signature (for debugging, or to use with other clients).
+
+The chat endpoint accepts the `grammar_json_functions` additional parameter which takes a JSON schema object.
+
+For example, with curl:
+
+```bash
+curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+     "model": "gpt-4",
+     "messages": [{"role": "user", "content": "How are you?"}],
+     "temperature": 0.1,
+     "grammar_json_functions": {
+        "oneOf": [
+            {
+                "type": "object",
+                "properties": {
+                    "function": {"const": "create_event"},
+                    "arguments": {
+                        "type": "object",
+                        "properties": {
+                            "title": {"type": "string"},
+                            "date": {"type": "string"},
+                            "time": {"type": "string"}
+                        }
+                    }
+                }
+            },
+            {
+                "type": "object",
+                "properties": {
+                    "function": {"const": "search"},
+                    "arguments": {
+                        "type": "object",
+                        "properties": {
+                            "query": {"type": "string"}
+                        }
+                    }
+                }
+            }
+        ]
+    }
+   }'
+```
+
+## 💡 Examples
+
+A full e2e example with `docker-compose` is available [here](https://github.com/go-skynet/LocalAI/tree/master/examples/functions).
--- a/docs/content/docs/features/text-generation.md
+++ b/docs/content/docs/features/text-generation.md
@ -0,0 +1,263 @@
+
+++
+disableToc = false
+title = "📖 Text generation (GPT)"
+weight = 10
+++
+
+LocalAI supports generating text with GPT with `llama.cpp` and other backends (such as `rwkv.cpp` as ) see also the [Model compatibility]({{%relref "docs/reference/compatibility-table" %}}) for an up-to-date list of the supported model families.
+
+Note:
+
+- You can also specify the model name as part of the OpenAI token.
+- If only one model is available, the API will use it for all the requests.
+
+## API Reference
+
+### Chat completions
+
+https://platform.openai.com/docs/api-reference/chat
+
+For example, to generate a chat completion, you can send a POST request to the `/v1/chat/completions` endpoint with the instruction as the request body:
+
+```bash
+curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+  "model": "ggml-koala-7b-model-q4_0-r2.bin",
+  "messages": [{"role": "user", "content": "Say this is a test!"}],
+  "temperature": 0.7
+}'
+```
+
+Available additional parameters: `top_p`, `top_k`, `max_tokens`
+
+### Edit completions
+
+https://platform.openai.com/docs/api-reference/edits
+
+To generate an edit completion you can send a POST request to the `/v1/edits` endpoint with the instruction as the request body:
+
+```bash
+curl http://localhost:8080/v1/edits -H "Content-Type: application/json" -d '{
+  "model": "ggml-koala-7b-model-q4_0-r2.bin",
+  "instruction": "rephrase",
+  "input": "Black cat jumped out of the window",
+  "temperature": 0.7
+}'
+```
+
+Available additional parameters: `top_p`, `top_k`, `max_tokens`.
+
+### Completions
+
+https://platform.openai.com/docs/api-reference/completions
+
+To generate a completion, you can send a POST request to the `/v1/completions` endpoint with the instruction as per the request body:
+
+```bash
+curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
+  "model": "ggml-koala-7b-model-q4_0-r2.bin",
+  "prompt": "A long time ago in a galaxy far, far away",
+  "temperature": 0.7
+}'
+```
+
+Available additional parameters: `top_p`, `top_k`, `max_tokens`
+
+### List models
+
+You can list all the models available with:
+
+```bash
+curl http://localhost:8080/v1/models
+```
+
+## Backends
+
+### AutoGPTQ
+
+[AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) is an easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
+
+#### Prerequisites
+
+This is an extra backend - in the container images is already available and there is nothing to do for the setup.
+
+If you are building LocalAI locally, you need to install [AutoGPTQ manually](https://github.com/PanQiWei/AutoGPTQ#quick-installation).
+
+
+#### Model setup
+
+The models are automatically downloaded from `huggingface` if not present the first time. It is possible to define models via `YAML` config file, or just by querying the endpoint with the `huggingface` repository model name. For example, create a `YAML` config file in `models/`:
+
+```
+name: orca
+backend: autogptq
+model_base_name: "orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order"
+parameters:
+  model: "TheBloke/orca_mini_v2_13b-GPTQ"
+# ...
+```
+
+Test with:
+
+```bash
+curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{                                                                                                         
+   "model": "orca",
+   "messages": [{"role": "user", "content": "How are you?"}],
+   "temperature": 0.1
+ }'
+```
+### RWKV
+
+A full example on how to run a rwkv model is in the [examples](https://github.com/go-skynet/LocalAI/tree/master/examples/rwkv).
+
+Note: rwkv models needs to specify the backend `rwkv` in the YAML config files and have an associated tokenizer along that needs to be provided with it:
+
+```
+36464540 -rw-r--r--  1 mudler mudler 1.2G May  3 10:51 rwkv_small
+36464543 -rw-r--r--  1 mudler mudler 2.4M May  3 10:51 rwkv_small.tokenizer.json
+```
+
+### llama.cpp
+
+[llama.cpp](https://github.com/ggerganov/llama.cpp) is a popular port of Facebook's LLaMA model in C/C++.
+
+{{% alert note %}}
+
+The `ggml` file format has been deprecated. If you are using `ggml` models and you are configuring your model with a YAML file, specify, use the `llama-ggml` backend instead. If you are relying in automatic detection of the model, you should be fine. For `gguf` models, use the `llama` backend. The go backend is deprecated as well but still available as `go-llama`. The go backend supports still features not available in the mainline: speculative sampling and embeddings.
+
+{{% /alert %}}
+
+#### Features
+
+The `llama.cpp` model supports the following features:
+- [📖 Text generation (GPT)]({{%relref "docs/features/text-generation" %}})
+- [🧠 Embeddings]({{%relref "docs/features/embeddings" %}})
+- [🔥 OpenAI functions]({{%relref "docs/features/openai-functions" %}})
+- [✍️ Constrained grammars]({{%relref "docs/features/constrained_grammars" %}})
+
+#### Setup
+
+LocalAI supports `llama.cpp` models out of the box. You can use the `llama.cpp` model in the same way as any other model. 
+
+##### Manual setup
+
+It is sufficient to copy the `ggml` or `gguf` model files in the `models` folder. You can refer to the model in the `model` parameter in the API calls.
+
+[You can optionally create an associated YAML]({{%relref "docs/advanced" %}}) model config file to tune the model's parameters or apply a template to the prompt.
+
+Prompt templates are useful for models that are fine-tuned towards a specific prompt. 
+
+##### Automatic setup
+
+LocalAI supports model galleries which are indexes of models. For instance, the huggingface gallery contains a large curated index of models from the huggingface model hub for `ggml` or `gguf` models.
+
+For instance, if you have the galleries enabled and LocalAI already running, you can just start chatting with models in huggingface by running:
+
+```bash
+curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+     "model": "TheBloke/WizardLM-13B-V1.2-GGML/wizardlm-13b-v1.2.ggmlv3.q2_K.bin",
+     "messages": [{"role": "user", "content": "Say this is a test!"}],
+     "temperature": 0.1
+   }'
+```
+
+LocalAI will automatically download and configure the model in the `model` directory.
+
+Models can be also preloaded or downloaded on demand. To learn about model galleries, check out the [model gallery documentation]({{%relref "docs/features/model-gallery" %}}).
+
+#### YAML configuration
+
+To use the `llama.cpp` backend, specify `llama` as the backend in the YAML file:
+
+```yaml
+name: llama
+backend: llama
+parameters:
+  # Relative to the models path
+  model: file.gguf.bin
+```
+
+In the example above we specify `llama` as the backend to restrict loading `gguf` models only. 
+
+For instance, to use the `llama-ggml` backend for `ggml` models:
+
+```yaml
+name: llama
+backend: llama-ggml
+parameters:
+  # Relative to the models path
+  model: file.ggml.bin
+```
+
+#### Reference
+
+- [llama](https://github.com/ggerganov/llama.cpp)
+- [binding](https://github.com/go-skynet/go-llama.cpp)
+
+
+### exllama/2
+
+[Exllama](https://github.com/turboderp/exllama) is a "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights". Both `exllama` and `exllama2` are supported.
+
+#### Model setup
+
+Download the model as a folder inside the `model ` directory and create a YAML file specifying the `exllama` backend. For instance with the `TheBloke/WizardLM-7B-uncensored-GPTQ` model:
+
+```
+$ git lfs install
+$ cd models && git clone https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ
+$ ls models/                                                                 
+.keep                        WizardLM-7B-uncensored-GPTQ/ exllama.yaml
+$ cat models/exllama.yaml                                                     
+name: exllama
+parameters:
+  model: WizardLM-7B-uncensored-GPTQ
+backend: exllama
+# Note: you can also specify "exllama2" if it's an exllama2 model here
+# ...
+```
+
+Test with:
+
+```bash
+curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{                                                                                                         
+   "model": "exllama",
+   "messages": [{"role": "user", "content": "How are you?"}],
+   "temperature": 0.1
+ }'
+```
+
+### vLLM
+
+[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference.
+
+LocalAI has a built-in integration with vLLM, and it can be used to run models. You can check out `vllm` performance [here](https://github.com/vllm-project/vllm#performance).
+
+#### Setup
+
+Create a YAML file for the model you want to use with `vllm`.
+
+To setup a model, you need to just specify the model name in the YAML config file:
+```yaml
+name: vllm
+backend: vllm
+parameters:
+    model: "facebook/opt-125m"
+
+# Decomment to specify a quantization method (optional)
+# quantization: "awq"
+```
+
+The backend will automatically download the required files in order to run the model.
+
+
+#### Usage
+
+Use the `completions` endpoint by specifying the `vllm` backend:
+```
+curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{   
+   "model": "vllm",
+   "prompt": "Hello, my name is",
+   "temperature": 0.1, "top_p": 0.1
+ }'
+```
--- a/docs/content/docs/features/text-to-audio.md
+++ b/docs/content/docs/features/text-to-audio.md
@ -0,0 +1,158 @@
+
+++
+disableToc = false
+title = "🗣 Text to audio (TTS)"
+weight = 11
+++
+
+The `/tts` endpoint can be used to generate speech from text.
+
+## Usage
+
+Input: `input`, `model`
+
+For example, to generate an audio file, you can send a POST request to the `/tts` endpoint with the instruction as the request body:
+
+```bash
+curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
+  "input": "Hello world",
+  "model": "tts"
+}'
+```
+
+Returns an `audio/wav` file.
+
+
+## Backends
+
+### 🐸 Coqui
+
+Required: Don't use `LocalAI` images ending with the `-core` tag,. Python dependencies are required in order to use this backend.
+
+Coqui works without any configuration, to test it, you can run the following curl command:
+
+```
+    curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
+        "backend": "coqui",
+        "model": "tts_models/en/ljspeech/glow-tts",
+        "input":"Hello, this is a test!"
+        }'
+```
+
+### Bark
+
+[Bark](https://github.com/suno-ai/bark) allows to generate audio from text prompts.
+
+This is an extra backend - in the container is already available and there is nothing to do for the setup.
+
+#### Model setup
+
+There is nothing to be done for the model setup. You can already start to use bark. The models will be downloaded the first time you use the backend.
+
+#### Usage
+
+Use the `tts` endpoint by specifying the `bark` backend:
+
+```
+curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
+     "backend": "bark",
+     "input":"Hello!"
+   }' | aplay
+```
+
+To specify a voice from https://github.com/suno-ai/bark#-voice-presets ( https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c ), use the `model` parameter:
+
+```
+curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
+     "backend": "bark",
+     "input":"Hello!",
+     "model": "v2/en_speaker_4"
+   }' | aplay
+```
+
+### Piper
+
+To install the `piper` audio models manually:
+
+- Download Voices from https://github.com/rhasspy/piper/releases/tag/v0.0.2
+- Extract the `.tar.tgz` files (.onnx,.json) inside `models`
+- Run the following command to test the model is working
+
+To use the tts endpoint, run the following command. You can specify a backend with the `backend` parameter. For example, to use the `piper` backend:
+```bash
+curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
+  "model":"it-riccardo_fasol-x-low.onnx",
+  "backend": "piper",
+  "input": "Ciao, sono Ettore"
+}' | aplay
+```
+
+Note:
+
+- `aplay` is a Linux command. You can use other tools to play the audio file.
+- The model name is the filename with the extension.
+- The model name is case sensitive.
+- LocalAI must be compiled with the `GO_TAGS=tts` flag.
+
+### Transformers-musicgen
+
+LocalAI also has experimental support for `transformers-musicgen` for the generation of short musical compositions. Currently, this is implemented via the same requests used for text to speech:
+
+```
+curl --request POST \
+  --url http://localhost:8080/tts \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "backend": "transformers-musicgen",
+    "model": "facebook/musicgen-medium",
+    "input": "Cello Rave"
+}' | aplay
+```
+
+Future versions of LocalAI will expose additional control over audio generation beyond the text prompt.
+
+### Vall-E-X
+
+[VALL-E-X](https://github.com/Plachtaa/VALL-E-X) is an open source implementation of Microsoft's VALL-E X zero-shot TTS model.
+
+#### Setup
+
+The backend will automatically download the required files in order to run the model.
+
+This is an extra backend - in the container is already available and there is nothing to do for the setup. If you are building manually, you need to install Vall-E-X manually first.
+
+#### Usage
+
+Use the tts endpoint by specifying the vall-e-x backend:
+
+```
+curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
+     "backend": "vall-e-x",
+     "input":"Hello!"
+   }' | aplay
+```
+
+#### Voice cloning
+
+In order to use voice cloning capabilities you must create a `YAML` configuration file to setup a model:
+
+```yaml
+name: cloned-voice
+backend: vall-e-x
+parameters:
+  model: "cloned-voice"
+vall-e:
+  # The path to the audio file to be cloned
+  # relative to the models directory 
+  audio_path: "path-to-wav-source.wav"
+```
+
+Then you can specify the model name in the requests:
+
+```
+curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
+     "backend": "vall-e-x",
+     "model": "cloned-voice",
+     "input":"Hello!"
+   }' | aplay
+```