mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-21 11:04:59 +00:00
docs/examples: enhancements (#1572)
* docs: re-order sections * fix references * Add mixtral-instruct, tinyllama-chat, dolphin-2.5-mixtral-8x7b * Fix link * Minor corrections * fix: models is a StringSlice, not a String Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP: switch docs theme * content * Fix GH link * enhancements * enhancements * Fixed how to link Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> * fixups * logo fix * more fixups * final touches --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> Co-authored-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com>
This commit is contained in:
parent
b5c93f176a
commit
6ca4d38a01
79 changed files with 1826 additions and 3546 deletions
7
docs/content/docs/getting-started/_index.en.md
Normal file
7
docs/content/docs/getting-started/_index.en.md
Normal file
|
@ -0,0 +1,7 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "Getting started"
|
||||
weight = 2
|
||||
icon = "rocket_launch"
|
||||
+++
|
261
docs/content/docs/getting-started/build.md
Normal file
261
docs/content/docs/getting-started/build.md
Normal file
|
@ -0,0 +1,261 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "Build LocalAI from source"
|
||||
weight = 6
|
||||
url = '/basics/build/'
|
||||
ico = "rocket_launch"
|
||||
+++
|
||||
|
||||
### Build
|
||||
|
||||
LocalAI can be built as a container image or as a single, portable binary. Note that the some model architectures might require Python libraries, which are not included in the binary. The binary contains only the core backends written in Go and C++.
|
||||
|
||||
LocalAI's extensible architecture allows you to add your own backends, which can be written in any language, and as such the container images contains also the Python dependencies to run all the available backends (for example, in order to run backends like __Diffusers__ that allows to generate images and videos from text).
|
||||
|
||||
In some cases you might want to re-build LocalAI from source (for instance to leverage Apple Silicon acceleration), or to build a custom container image with your own backends. This section contains instructions on how to build LocalAI from source.
|
||||
|
||||
#### Container image
|
||||
|
||||
Requirements:
|
||||
|
||||
- Docker or podman, or a container engine
|
||||
|
||||
In order to build the `LocalAI` container image locally you can use `docker`, for example:
|
||||
|
||||
```
|
||||
# build the image
|
||||
docker build -t localai .
|
||||
docker run localai
|
||||
```
|
||||
|
||||
#### Build LocalAI locally
|
||||
|
||||
##### Requirements
|
||||
|
||||
In order to build LocalAI locally, you need the following requirements:
|
||||
|
||||
- Golang >= 1.21
|
||||
- Cmake/make
|
||||
- GCC
|
||||
- GRPC
|
||||
|
||||
To install the dependencies follow the instructions below:
|
||||
|
||||
{{< tabs tabTotal="3" >}}
|
||||
{{% tab tabName="Apple" %}}
|
||||
|
||||
```bash
|
||||
brew install abseil cmake go grpc protobuf wget
|
||||
```
|
||||
|
||||
{{% /tab %}}
|
||||
{{% tab tabName="Debian" %}}
|
||||
|
||||
```bash
|
||||
apt install golang protobuf-compiler-grpc libgrpc-dev make cmake
|
||||
```
|
||||
|
||||
{{% /tab %}}
|
||||
{{% tab tabName="From source" %}}
|
||||
|
||||
Specify `BUILD_GRPC_FOR_BACKEND_LLAMA=true` to build automatically the gRPC dependencies
|
||||
|
||||
```bash
|
||||
make ... BUILD_GRPC_FOR_BACKEND_LLAMA=true build
|
||||
```
|
||||
|
||||
{{% /tab %}}
|
||||
{{< /tabs >}}
|
||||
|
||||
##### Build
|
||||
To build LocalAI with `make`:
|
||||
|
||||
```
|
||||
git clone https://github.com/go-skynet/LocalAI
|
||||
cd LocalAI
|
||||
make build
|
||||
```
|
||||
|
||||
This should produce the binary `local-ai`
|
||||
|
||||
{{% alert note %}}
|
||||
|
||||
#### CPU flagset compatibility
|
||||
|
||||
|
||||
LocalAI uses different backends based on ggml and llama.cpp to run models. If your CPU doesn't support common instruction sets, you can disable them during build:
|
||||
|
||||
```
|
||||
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF" make build
|
||||
```
|
||||
|
||||
To have effect on the container image, you need to set `REBUILD=true`:
|
||||
|
||||
```
|
||||
docker run quay.io/go-skynet/localai
|
||||
docker run --rm -ti -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -e REBUILD=true -e CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF" -v $PWD/models:/models quay.io/go-skynet/local-ai:latest
|
||||
```
|
||||
|
||||
{{% /alert %}}
|
||||
|
||||
### Example: Build on mac
|
||||
|
||||
Building on Mac (M1 or M2) works, but you may need to install some prerequisites using `brew`.
|
||||
|
||||
The below has been tested by one mac user and found to work. Note that this doesn't use Docker to run the server:
|
||||
|
||||
```
|
||||
# install build dependencies
|
||||
brew install abseil cmake go grpc protobuf wget
|
||||
|
||||
# clone the repo
|
||||
git clone https://github.com/go-skynet/LocalAI.git
|
||||
|
||||
cd LocalAI
|
||||
|
||||
# build the binary
|
||||
make build
|
||||
|
||||
# Download gpt4all-j to models/
|
||||
wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
|
||||
|
||||
# Use a template from the examples
|
||||
cp -rf prompt-templates/ggml-gpt4all-j.tmpl models/
|
||||
|
||||
# Run LocalAI
|
||||
./local-ai --models-path=./models/ --debug=true
|
||||
|
||||
# Now API is accessible at localhost:8080
|
||||
curl http://localhost:8080/v1/models
|
||||
|
||||
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "ggml-gpt4all-j",
|
||||
"messages": [{"role": "user", "content": "How are you?"}],
|
||||
"temperature": 0.9
|
||||
}'
|
||||
```
|
||||
|
||||
### Build with Image generation support
|
||||
|
||||
|
||||
**Requirements**: OpenCV, Gomp
|
||||
|
||||
Image generation requires `GO_TAGS=stablediffusion` or `GO_TAGS=tinydream` to be set during build:
|
||||
|
||||
```
|
||||
make GO_TAGS=stablediffusion build
|
||||
```
|
||||
|
||||
### Build with Text to audio support
|
||||
|
||||
**Requirements**: piper-phonemize
|
||||
|
||||
Text to audio support is experimental and requires `GO_TAGS=tts` to be set during build:
|
||||
|
||||
```
|
||||
make GO_TAGS=tts build
|
||||
```
|
||||
|
||||
### Acceleration
|
||||
|
||||
List of the variables available to customize the build:
|
||||
|
||||
| Variable | Default | Description |
|
||||
| ---------------------| ------- | ----------- |
|
||||
| `BUILD_TYPE` | None | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas` |
|
||||
| `GO_TAGS` | `tts stablediffusion` | Go tags. Available: `stablediffusion`, `tts`, `tinydream` |
|
||||
| `CLBLAST_DIR` | | Specify a CLBlast directory |
|
||||
| `CUDA_LIBPATH` | | Specify a CUDA library path |
|
||||
|
||||
#### OpenBLAS
|
||||
|
||||
Software acceleration.
|
||||
|
||||
Requirements: OpenBLAS
|
||||
|
||||
```
|
||||
make BUILD_TYPE=openblas build
|
||||
```
|
||||
|
||||
#### CuBLAS
|
||||
|
||||
Nvidia Acceleration.
|
||||
|
||||
Requirement: Nvidia CUDA toolkit
|
||||
|
||||
Note: CuBLAS support is experimental, and has not been tested on real HW. please report any issues you find!
|
||||
|
||||
```
|
||||
make BUILD_TYPE=cublas build
|
||||
```
|
||||
|
||||
More informations available in the upstream PR: https://github.com/ggerganov/llama.cpp/pull/1412
|
||||
|
||||
|
||||
#### Hipblas (AMD GPU with ROCm on Arch Linux)
|
||||
|
||||
Packages:
|
||||
```
|
||||
pacman -S base-devel git rocm-hip-sdk rocm-opencl-sdk opencv clblast grpc
|
||||
```
|
||||
|
||||
Library links:
|
||||
```
|
||||
export CGO_CFLAGS="-I/usr/include/opencv4"
|
||||
export CGO_CXXFLAGS="-I/usr/include/opencv4"
|
||||
export CGO_LDFLAGS="-L/opt/rocm/hip/lib -lamdhip64 -L/opt/rocm/lib -lOpenCL -L/usr/lib -lclblast -lrocblas -lhipblas -lrocrand -lomp -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link"
|
||||
```
|
||||
|
||||
Build:
|
||||
```
|
||||
make BUILD_TYPE=hipblas GPU_TARGETS=gfx1030
|
||||
```
|
||||
|
||||
#### ClBLAS
|
||||
|
||||
AMD/Intel GPU acceleration.
|
||||
|
||||
Requirement: OpenCL, CLBlast
|
||||
|
||||
```
|
||||
make BUILD_TYPE=clblas build
|
||||
```
|
||||
|
||||
To specify a clblast dir set: `CLBLAST_DIR`
|
||||
|
||||
#### Metal (Apple Silicon)
|
||||
|
||||
```
|
||||
make BUILD_TYPE=metal build
|
||||
|
||||
# Set `gpu_layers: 1` to your YAML model config file and `f16: true`
|
||||
# Note: only models quantized with q4_0 are supported!
|
||||
```
|
||||
|
||||
|
||||
### Windows compatibility
|
||||
|
||||
Make sure to give enough resources to the running container. See https://github.com/go-skynet/LocalAI/issues/2
|
||||
|
||||
### Examples
|
||||
|
||||
More advanced build options are available, for instance to build only a single backend.
|
||||
|
||||
#### Build only a single backend
|
||||
|
||||
You can control the backends that are built by setting the `GRPC_BACKENDS` environment variable. For instance, to build only the `llama-cpp` backend only:
|
||||
|
||||
```bash
|
||||
make GRPC_BACKENDS=backend-assets/grpc/llama-cpp build
|
||||
```
|
||||
|
||||
By default, all the backends are built.
|
||||
|
||||
#### Specific llama.cpp version
|
||||
|
||||
To build with a specific version of llama.cpp, set `CPPLLAMA_VERSION` to the tag or wanted sha:
|
||||
|
||||
```
|
||||
CPPLLAMA_VERSION=<sha> make build
|
||||
```
|
71
docs/content/docs/getting-started/customize-model.md
Normal file
71
docs/content/docs/getting-started/customize-model.md
Normal file
|
@ -0,0 +1,71 @@
|
|||
+++
|
||||
disableToc = false
|
||||
title = "Customizing the Model"
|
||||
weight = 4
|
||||
icon = "rocket_launch"
|
||||
|
||||
+++
|
||||
|
||||
To customize the prompt template or the default settings of the model, a configuration file is utilized. This file must adhere to the LocalAI YAML configuration standards. For comprehensive syntax details, refer to the [advanced documentation]({{%relref "docs/advanced" %}}). The configuration file can be located either remotely (such as in a Github Gist) or within the local filesystem or a remote URL.
|
||||
|
||||
LocalAI can be initiated using either its container image or binary, with a command that includes URLs of model config files or utilizes a shorthand format (like `huggingface://` or `github://`), which is then expanded into complete URLs.
|
||||
|
||||
The configuration can also be set via an environment variable. For instance:
|
||||
|
||||
```
|
||||
# Command-Line Arguments
|
||||
local-ai github://owner/repo/file.yaml@branch
|
||||
|
||||
# Environment Variable
|
||||
MODELS="github://owner/repo/file.yaml@branch,github://owner/repo/file.yaml@branch" local-ai
|
||||
```
|
||||
|
||||
Here's an example to initiate the **phi-2** model:
|
||||
|
||||
```bash
|
||||
docker run -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core https://gist.githubusercontent.com/mudler/ad601a0488b497b69ec549150d9edd18/raw/a8a8869ef1bb7e3830bf5c0bae29a0cce991ff8d/phi-2.yaml
|
||||
```
|
||||
|
||||
{{% alert icon="" %}}
|
||||
The model configurations used in the quickstart are accessible here: [https://github.com/mudler/LocalAI/tree/master/embedded/models](https://github.com/mudler/LocalAI/tree/master/embedded/models). Contributions are welcome; please feel free to submit a Pull Request.
|
||||
|
||||
The `phi-2` model configuration from the quickstart is expanded from [https://github.com/mudler/LocalAI/blob/master/examples/configurations/phi-2.yaml](https://github.com/mudler/LocalAI/blob/master/examples/configurations/phi-2.yaml).
|
||||
{{% /alert %}}
|
||||
|
||||
## Example: Customizing the Prompt Template
|
||||
|
||||
To modify the prompt template, create a Github gist or a Pastebin file, and copy the content from [https://github.com/mudler/LocalAI/blob/master/examples/configurations/phi-2.yaml](https://github.com/mudler/LocalAI/blob/master/examples/configurations/phi-2.yaml). Alter the fields as needed:
|
||||
|
||||
```yaml
|
||||
name: phi-2
|
||||
context_size: 2048
|
||||
f16: true
|
||||
threads: 11
|
||||
gpu_layers: 90
|
||||
mmap: true
|
||||
parameters:
|
||||
# Reference any HF model or a local file here
|
||||
model: huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
|
||||
temperature: 0.2
|
||||
top_k: 40
|
||||
top_p: 0.95
|
||||
template:
|
||||
|
||||
chat: &template |
|
||||
Instruct: {{.Input}}
|
||||
Output:
|
||||
# Modify the prompt template here ^^^ as per your requirements
|
||||
completion: *template
|
||||
```
|
||||
|
||||
Then, launch LocalAI using your gist's URL:
|
||||
|
||||
```bash
|
||||
## Important! Substitute with your gist's URL!
|
||||
docker run -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core https://gist.githubusercontent.com/xxxx/phi-2.yaml
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Visit the [advanced section]({{%relref "docs/advanced" %}}) for more insights on prompt templates and configuration files.
|
||||
- To learn about fine-tuning an LLM model, check out the [fine-tuning section]({{%relref "docs/advanced/fine-tuning" %}}).
|
150
docs/content/docs/getting-started/manual.md
Normal file
150
docs/content/docs/getting-started/manual.md
Normal file
|
@ -0,0 +1,150 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "Run models manually"
|
||||
weight = 5
|
||||
icon = "rocket_launch"
|
||||
|
||||
+++
|
||||
|
||||
|
||||
1. Ensure you have a model file, a configuration YAML file, or both. Customize model defaults and specific settings with a configuration file. For advanced configurations, refer to the [Advanced Documentation](docs/advanced).
|
||||
|
||||
2. For GPU Acceleration instructions, visit [GPU acceleration](docs/features/gpu-acceleration).
|
||||
|
||||
{{< tabs tabTotal="5" >}}
|
||||
{{% tab tabName="Docker" %}}
|
||||
|
||||
```bash
|
||||
# Prepare the models into the `model` directory
|
||||
mkdir models
|
||||
|
||||
# copy your models to it
|
||||
cp your-model.gguf models/
|
||||
|
||||
# run the LocalAI container
|
||||
docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4
|
||||
# You should see:
|
||||
#
|
||||
# ┌───────────────────────────────────────────────────┐
|
||||
# │ Fiber v2.42.0 │
|
||||
# │ http://127.0.0.1:8080 │
|
||||
# │ (bound on host 0.0.0.0 and port 8080) │
|
||||
# │ │
|
||||
# │ Handlers ............. 1 Processes ........... 1 │
|
||||
# │ Prefork ....... Disabled PID ................. 1 │
|
||||
# └───────────────────────────────────────────────────┘
|
||||
|
||||
# Try the endpoint with curl
|
||||
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "your-model.gguf",
|
||||
"prompt": "A long time ago in a galaxy far, far away",
|
||||
"temperature": 0.7
|
||||
}'
|
||||
```
|
||||
|
||||
{{% alert note %}}
|
||||
- If running on Apple Silicon (ARM) it is **not** suggested to run on Docker due to emulation. Follow the [build instructions]({{%relref "docs/getting-started/build" %}}) to use Metal acceleration for full GPU support.
|
||||
- If you are running Apple x86_64 you can use `docker`, there is no additional gain into building it from source.
|
||||
{{% /alert %}}
|
||||
|
||||
{{% /tab %}}
|
||||
{{% tab tabName="Docker compose" %}}
|
||||
|
||||
```bash
|
||||
# Clone LocalAI
|
||||
git clone https://github.com/go-skynet/LocalAI
|
||||
|
||||
cd LocalAI
|
||||
|
||||
# (optional) Checkout a specific LocalAI tag
|
||||
# git checkout -b build <TAG>
|
||||
|
||||
# copy your models to models/
|
||||
cp your-model.gguf models/
|
||||
|
||||
# (optional) Edit the .env file to set things like context size and threads
|
||||
# vim .env
|
||||
|
||||
# start with docker compose
|
||||
docker compose up -d --pull always
|
||||
# or you can build the images with:
|
||||
# docker compose up -d --build
|
||||
|
||||
# Now API is accessible at localhost:8080
|
||||
curl http://localhost:8080/v1/models
|
||||
# {"object":"list","data":[{"id":"your-model.gguf","object":"model"}]}
|
||||
|
||||
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "your-model.gguf",
|
||||
"prompt": "A long time ago in a galaxy far, far away",
|
||||
"temperature": 0.7
|
||||
}'
|
||||
```
|
||||
|
||||
Note: If you are on Windows, please make sure the project is on the Linux Filesystem, otherwise loading models might be slow. For more Info: [Microsoft Docs](https://learn.microsoft.com/en-us/windows/wsl/filesystems)
|
||||
|
||||
{{% /tab %}}
|
||||
|
||||
{{% tab tabName="Kubernetes" %}}
|
||||
|
||||
For installing LocalAI in Kubernetes, you can use the following helm chart:
|
||||
|
||||
```bash
|
||||
# Install the helm repository
|
||||
helm repo add go-skynet https://go-skynet.github.io/helm-charts/
|
||||
# Update the repositories
|
||||
helm repo update
|
||||
# Get the values
|
||||
helm show values go-skynet/local-ai > values.yaml
|
||||
|
||||
# Edit the values value if needed
|
||||
# vim values.yaml ...
|
||||
|
||||
# Install the helm chart
|
||||
helm install local-ai go-skynet/local-ai -f values.yaml
|
||||
```
|
||||
|
||||
{{% /tab %}}
|
||||
{{% tab tabName="From binary" %}}
|
||||
|
||||
LocalAI binary releases are available in [Github](https://github.com/go-skynet/LocalAI/releases).
|
||||
|
||||
{{% /tab %}}
|
||||
|
||||
{{% tab tabName="From source" %}}
|
||||
|
||||
See the [build section]({{%relref "docs/getting-started/build" %}}).
|
||||
|
||||
{{% /tab %}}
|
||||
|
||||
{{< /tabs >}}
|
||||
|
||||
|
||||
### Example (Docker)
|
||||
|
||||
```bash
|
||||
mkdir models
|
||||
|
||||
# Download luna-ai-llama2 to models/
|
||||
wget https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf -O models/luna-ai-llama2
|
||||
|
||||
# Use a template from the examples
|
||||
cp -rf prompt-templates/getting_started.tmpl models/luna-ai-llama2.tmpl
|
||||
|
||||
docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4
|
||||
|
||||
# Now API is accessible at localhost:8080
|
||||
curl http://localhost:8080/v1/models
|
||||
# {"object":"list","data":[{"id":"luna-ai-llama2","object":"model"}]}
|
||||
|
||||
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "luna-ai-llama2",
|
||||
"messages": [{"role": "user", "content": "How are you?"}],
|
||||
"temperature": 0.9
|
||||
}'
|
||||
|
||||
# {"model":"luna-ai-llama2","choices":[{"message":{"role":"assistant","content":"I'm doing well, thanks. How about you?"}}]}
|
||||
```
|
||||
|
||||
For more model configurations, visit the [Examples Section](https://github.com/mudler/LocalAI/tree/master/examples/configurations).
|
187
docs/content/docs/getting-started/quickstart.md
Normal file
187
docs/content/docs/getting-started/quickstart.md
Normal file
|
@ -0,0 +1,187 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "Quickstart"
|
||||
weight = 3
|
||||
url = '/basics/getting_started/'
|
||||
icon = "rocket_launch"
|
||||
|
||||
+++
|
||||
|
||||
**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing. It allows you to run [LLMs]({{%relref "docs/features/text-generation" %}}), generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures.
|
||||
|
||||
## Installation Methods
|
||||
|
||||
LocalAI is available as a container image and binary, compatible with various container engines like Docker, Podman, and Kubernetes. Container images are published on [quay.io](https://quay.io/repository/go-skynet/local-ai?tab=tags&tag=latest) and [Dockerhub](https://hub.docker.com/r/localai/localai). Binaries can be downloaded from [GitHub](https://github.com/mudler/LocalAI/releases).
|
||||
|
||||
|
||||
{{% alert icon="💡" %}}
|
||||
|
||||
**Hardware Requirements:** The hardware requirements for LocalAI vary based on the [model size] and [quantization] method used. For performance benchmarks with different backends, such as `llama.cpp`, visit [this link](https://github.com/ggerganov/llama.cpp#memorydisk-requirements). The `rwkv` backend is noted for its lower resource consumption.
|
||||
|
||||
{{% /alert %}}
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before you begin, ensure you have a container engine installed if you are not using the binaries. Suitable options include Docker or Podman. For installation instructions, refer to the following guides:
|
||||
|
||||
- [Install Docker Desktop (Mac, Windows, Linux)](https://docs.docker.com/get-docker/)
|
||||
- [Install Podman (Linux)](https://podman.io/getting-started/installation)
|
||||
- [Install Docker engine (Servers)](https://docs.docker.com/engine/install/#get-started)
|
||||
|
||||
|
||||
## Running Models
|
||||
|
||||
> _Do you have already a model file? Skip to [Run models manually]({{%relref "docs/getting-started/manual" %}})_.
|
||||
|
||||
LocalAI allows one-click runs with popular models. It downloads the model and starts the API with the model loaded.
|
||||
|
||||
There are different categories of models: [LLMs]({{%relref "docs/features/text-generation" %}}), [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) , [Embeddings]({{%relref "docs/features/embeddings" %}}), [Audio to Text]({{%relref "docs/features/audio-to-text" %}}), and [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) depending on the backend being used and the model architecture.
|
||||
|
||||
{{% alert icon="💡" %}}
|
||||
|
||||
To customize the models, see [Model customization]({{%relref "docs/getting-started/customize-model" %}}). For more model configurations, visit the [Examples Section](https://github.com/mudler/LocalAI/tree/master/examples/configurations).
|
||||
{{% /alert %}}
|
||||
|
||||
{{< tabs tabTotal="3" >}}
|
||||
{{% tab tabName="CPU-only" %}}
|
||||
|
||||
> 💡Don't need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies
|
||||
|
||||
| Model | Category | Docker command |
|
||||
| --- | --- | --- |
|
||||
| [phi-2](https://huggingface.co/microsoft/phi-2) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core phi-2``` |
|
||||
| [llava](https://github.com/SkunkworksAI/BakLLaVA) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava``` |
|
||||
| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mistral-openorca``` |
|
||||
| [bert-cpp](https://github.com/skeskinen/bert.cpp) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core bert-cpp``` |
|
||||
| [all-minilm-l6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg all-minilm-l6-v2``` |
|
||||
| whisper-base | [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core whisper-base``` |
|
||||
| rhasspy-voice-en-us-amy | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core rhasspy-voice-en-us-amy``` |
|
||||
| coqui | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg coqui``` |
|
||||
| bark | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg bark``` |
|
||||
| vall-e-x | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg vall-e-x``` |
|
||||
| mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mixtral-instruct``` |
|
||||
| [tinyllama-chat](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF) [original model](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.3) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core tinyllama-chat``` |
|
||||
| [dolphin-2.5-mixtral-8x7b](https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core dolphin-2.5-mixtral-8x7b``` |
|
||||
{{% /tab %}}
|
||||
{{% tab tabName="GPU (CUDA 11)" %}}
|
||||
|
||||
|
||||
> To know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version` see also [GPU acceleration]({{%relref "docs/features/gpu-acceleration" %}}).
|
||||
|
||||
| Model | Category | Docker command |
|
||||
| --- | --- | --- |
|
||||
| [phi-2](https://huggingface.co/microsoft/phi-2) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core phi-2``` |
|
||||
| [llava](https://github.com/SkunkworksAI/BakLLaVA) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core llava``` |
|
||||
| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mistral-openorca``` |
|
||||
| [bert-cpp](https://github.com/skeskinen/bert.cpp) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core bert-cpp``` |
|
||||
| [all-minilm-l6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 all-minilm-l6-v2``` |
|
||||
| whisper-base | [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core whisper-base``` |
|
||||
| rhasspy-voice-en-us-amy | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core rhasspy-voice-en-us-amy``` |
|
||||
| coqui | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 coqui``` |
|
||||
| bark | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 bark``` |
|
||||
| vall-e-x | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 vall-e-x``` |
|
||||
| mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mixtral-instruct``` |
|
||||
| [tinyllama-chat](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF) [original model](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.3) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core tinyllama-chat``` |
|
||||
| [dolphin-2.5-mixtral-8x7b](https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core dolphin-2.5-mixtral-8x7b``` |
|
||||
{{% /tab %}}
|
||||
|
||||
|
||||
{{% tab tabName="GPU (CUDA 12)" %}}
|
||||
|
||||
> To know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version` see also [GPU acceleration]({{%relref "docs/features/gpu-acceleration" %}}).
|
||||
|
||||
| Model | Category | Docker command |
|
||||
| --- | --- | --- |
|
||||
| [phi-2](https://huggingface.co/microsoft/phi-2) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core phi-2``` |
|
||||
| [llava](https://github.com/SkunkworksAI/BakLLaVA) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core llava``` |
|
||||
| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mistral-openorca``` |
|
||||
| [bert-cpp](https://github.com/skeskinen/bert.cpp) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core bert-cpp``` |
|
||||
| [all-minilm-l6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 all-minilm-l6-v2``` |
|
||||
| whisper-base | [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core whisper-base``` |
|
||||
| rhasspy-voice-en-us-amy | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core rhasspy-voice-en-us-amy``` |
|
||||
| coqui | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 coqui``` |
|
||||
| bark | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 bark``` |
|
||||
| vall-e-x | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 vall-e-x``` |
|
||||
| mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mixtral-instruct``` |
|
||||
| [tinyllama-chat](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF) [original model](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.3) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core tinyllama-chat``` |
|
||||
| [dolphin-2.5-mixtral-8x7b](https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core dolphin-2.5-mixtral-8x7b``` |
|
||||
{{% /tab %}}
|
||||
|
||||
{{< /tabs >}}
|
||||
|
||||
{{% alert icon="💡" %}}
|
||||
**Tip** You can actually specify multiple models to start an instance with the models loaded, for example to have both llava and phi-2 configured:
|
||||
|
||||
```bash
|
||||
docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava phi-2
|
||||
```
|
||||
|
||||
{{% /alert %}}
|
||||
|
||||
## Container images
|
||||
|
||||
LocalAI provides a variety of images to support different environments. These images are available on [quay.io](https://quay.io/repository/go-skynet/local-ai?tab=tags) and [Dockerhub](https://hub.docker.com/r/localai/localai).
|
||||
|
||||
For GPU Acceleration support for Nvidia video graphic cards, use the Nvidia/CUDA images, if you don't have a GPU, use the CPU images. If you have AMD or Mac Silicon, see the [build section]({{%relref "docs/getting-started/build" %}}).
|
||||
|
||||
{{% alert icon="💡" %}}
|
||||
|
||||
**Available Images Types**:
|
||||
|
||||
- Images ending with `-core` are smaller images without predownload python dependencies. Use these images if you plan to use `llama.cpp`, `stablediffusion-ncn`, `tinydream` or `rwkv` backends - if you are not sure which one to use, do **not** use these images.
|
||||
- FFMpeg is **not** included in the default images due to [its licensing](https://www.ffmpeg.org/legal.html). If you need FFMpeg, use the images ending with `-ffmpeg`. Note that `ffmpeg` is needed in case of using `audio-to-text` LocalAI's features.
|
||||
- If using old and outdated CPUs and no GPUs you might need to set `REBUILD` to `true` as environment variable along with options to disable the flags which your CPU does not support, however note that inference will perform poorly and slow. See also [flagset compatibility]({{%relref "docs/getting-started/build#cpu-flagset-compatibility" %}}).
|
||||
|
||||
{{% /alert %}}
|
||||
|
||||
{{< tabs tabTotal="3" >}}
|
||||
{{% tab tabName="Vanilla / CPU Images" %}}
|
||||
|
||||
| Description | Quay | Dockerhub |
|
||||
| --- | --- | --- |
|
||||
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master` | `localai/localai:master` |
|
||||
| Latest tag | `quay.io/go-skynet/local-ai:latest` | `localai/localai:latest` |
|
||||
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}` | `localai/localai:{{< version >}}` |
|
||||
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-ffmpeg` | `localai/localai:{{< version >}}-ffmpeg` |
|
||||
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-ffmpeg-core` | `localai/localai:{{< version >}}-ffmpeg-core` |
|
||||
|
||||
{{% /tab %}}
|
||||
|
||||
{{% tab tabName="GPU Images CUDA 11" %}}
|
||||
|
||||
|
||||
| Description | Quay | Dockerhub |
|
||||
| --- | --- | --- |
|
||||
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-cublas-cuda11` | `localai/localai:master-cublas-cuda11` |
|
||||
| Latest tag | `quay.io/go-skynet/local-ai:latest-cublas-cuda11` | `localai/localai:latest-cublas-cuda11` |
|
||||
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11` | `localai/localai:{{< version >}}-cublas-cuda11` |
|
||||
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11-ffmpeg` | `localai/localai:{{< version >}}-cublas-cuda11-ffmpeg` |
|
||||
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11-ffmpeg-core` | `localai/localai:{{< version >}}-cublas-cuda11-ffmpeg-core` |
|
||||
|
||||
{{% /tab %}}
|
||||
|
||||
{{% tab tabName="GPU Images CUDA 12" %}}
|
||||
|
||||
|
||||
| Description | Quay | Dockerhub |
|
||||
| --- | --- | --- |
|
||||
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-cublas-cuda12` | `localai/localai:master-cublas-cuda12` |
|
||||
| Latest tag | `quay.io/go-skynet/local-ai:latest-cublas-cuda12` | `localai/localai:latest-cublas-cuda12` |
|
||||
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12` | `localai/localai:{{< version >}}-cublas-cuda12` |
|
||||
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12-ffmpeg` | `localai/localai:{{< version >}}-cublas-cuda12-ffmpeg` |
|
||||
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12-ffmpeg-core` | `localai/localai:{{< version >}}-cublas-cuda12-ffmpeg-core` |
|
||||
|
||||
|
||||
{{% /tab %}}
|
||||
|
||||
{{< /tabs >}}
|
||||
|
||||
## What's next?
|
||||
|
||||
Explore further resources and community contributions:
|
||||
|
||||
- [Community How to's](https://io.midori-ai.xyz/howtos/)
|
||||
- [Examples](https://github.com/mudler/LocalAI/tree/master/examples#examples)
|
||||
|
||||
[](https://github.com/mudler/LocalAI/tree/master/examples#examples)
|
Loading…
Add table
Add a link
Reference in a new issue