feat: embedded model configurations, add popular model examples, refactoring (#1532)

* move downloader out * separate startup functions for preloading configuration files * docs: add popular model examples Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * shorteners * Add llava * Add mistral-openorca * Better link to build section * docs: update * fixup * Drop code dups * Minor fixups * Apply suggestions from code review Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * ci: try to cache gRPC build during tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: do not build all images for tests, just necessary * ci: cache gRPC also in release pipeline * fixes * Update model_preload_test.go Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-05-20 10:35:01 +00:00 · 2024-01-05 17:16:33 -05:00 · 2024-01-05 17:16:33 -05:00 · 09e5d9007b
commit 09e5d9007b
parent db926896bd
26 changed files with 586 additions and 150 deletions
--- a/docs/content/advanced/_index.en.md
+++ b/docs/content/advanced/_index.en.md
@ -9,7 +9,7 @@ weight = 6

 In order to define default prompts, model parameters (such as custom default `top_p` or `top_k`), LocalAI can be configured to serve user-defined models with a set of default parameters and templates.

-You can create multiple `yaml` files in the models path or either specify a single YAML configuration file. 
+In order to configure a model, you can create multiple `yaml` files in the models path or either specify a single YAML configuration file. 
 Consider the following `models` folder in the `example/chatbot-ui`:

 ```
@ -96,6 +96,12 @@ Specifying a `config-file` via CLI allows to declare models in a single file as

 See also [chatbot-ui](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui) as an example on how to use config files.

+It is possible to specify a full URL or a short-hand URL to a YAML model configuration file and use it on start with local-ai, for example to use phi-2:
+
+```
+local-ai github://mudler/LocalAI/examples/configurations/phi-2.yaml@master
+```
+
 ### Full config model file reference

 ```yaml
--- a/docs/content/build/_index.en.md
+++ b/docs/content/build/_index.en.md
@ -235,6 +235,14 @@ make GRPC_BACKENDS=backend-assets/grpc/llama-cpp build

 By default, all the backends are built.

+### Specific llama.cpp version
+
+To build with a specific version of llama.cpp, set `CPPLLAMA_VERSION` to the tag or wanted sha:
+
+```
+CPPLLAMA_VERSION=<sha> make build
+```
+
 ### Windows compatibility

 Make sure to give enough resources to the running container. See https://github.com/go-skynet/LocalAI/issues/2
--- a/docs/content/features/GPU-acceleration.md
+++ b/docs/content/features/GPU-acceleration.md
@ -15,11 +15,19 @@ This section contains instruction on how to use LocalAI with GPU acceleration.
 For accelleration for AMD or Metal HW there are no specific container images, see the [build]({{%relref "build/#acceleration" %}})
 {{% /notice %}}

-### CUDA
+### CUDA(NVIDIA) acceleration

 Requirement: nvidia-container-toolkit (installation instructions [1](https://www.server-world.info/en/note?os=Ubuntu_22.04&p=nvidia&f=2) [2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))

-To use CUDA, use the images with the `cublas` tag.
+To check what CUDA version do you need, you can either run `nvidia-smi` or `nvcc --version`. 
+
+Alternatively, you can also check nvidia-smi with docker:
+
+```
+docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
+```
+
+To use CUDA, use the images with the `cublas` tag, for example.

 The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=tags):

--- a/docs/content/getting_started/_index.en.md
+++ b/docs/content/getting_started/_index.en.md
@ -14,6 +14,8 @@ See also our [How to]({{%relref "howtos" %}}) section for end-to-end guided exam

 The easiest way to run LocalAI is by using [`docker compose`](https://docs.docker.com/compose/install/) or with [Docker](https://docs.docker.com/engine/install/) (to build locally, see the [build section]({{%relref "build" %}})).

+LocalAI needs at least a model file to work, or a configuration YAML file, or both. You can customize further model defaults and specific settings with a configuration file (see [advanced]({{%relref "advanced" %}})).
+
 {{% notice note %}}
 To run with GPU Accelleration, see [GPU acceleration]({{%relref "features/gpu-acceleration" %}}).
 {{% /notice %}}
@ -113,8 +115,79 @@ helm install local-ai go-skynet/local-ai -f values.yaml

 {{% /tab %}}

+{{% tab name="From source" %}}
+
+See the [build section]({{%relref "build" %}}).
+  
+{{% /tab %}}
+
 {{< /tabs >}}

+### Running Popular models (one-click!)
+
+{{% notice note %}}
+
+Note: this feature currently is available only on master builds.
+
+{{% /notice %}}
+
+You can run `local-ai` directly with a model name, and it will download the model and start the API with the model loaded.
+
+#### CPU-only
+
+> You can use these images which are lighter and do not have Nvidia dependencies
+
+| Model | Docker command |
+| --- | --- |
+| phi2 | ```docker run -p 8080:8080 -ti --rm quay.io/go-skynet/local-ai:{{< version >}}-ffmpeg-core phi-2``` |
+| llava | ```docker run -p 8080:8080 -ti --rm quay.io/go-skynet/local-ai:{{< version >}}-ffmpeg-core llava``` |
+| mistral-openorca | ```docker run -p 8080:8080 -ti --rm quay.io/go-skynet/local-ai:{{< version >}}-ffmpeg-core mistral-openorca``` |
+
+#### GPU (CUDA 11)
+
+For accellerated images with Nvidia and CUDA11, use the following images.
+
+> If you do not know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version`
+
+| Model | Docker command |
+| --- | --- |
+| phi-2 | ```docker run -p 8080:8080 --gpus all -ti --rm quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11-core phi-2``` |
+| llava | ```docker run -p 8080:8080 -ti --rm quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11-core llava``` |
+| mistral-openorca | ```docker run -p 8080:8080 --gpus all -ti --rm quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11-core mistral-openorca``` |
+
+#### GPU (CUDA 12)
+
+> If you do not know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version`
+
+| Model | Docker command |
+| --- | --- |
+| phi-2 | ```docker run -p 8080:8080 -ti --gpus all --rm quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12-core phi-2``` |
+| llava | ```docker run -p 8080:8080 -ti --gpus all --rm quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12-core llava``` |
+| mistral-openorca | ```docker run -p 8080:8080 --gpus all -ti --rm quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12-core mistral-openorca``` |
+
+{{% notice note %}}
+
+LocalAI can be started (either the container image or the binary) with a list of model config files URLs or our short-handed format (e.g. `huggingface://`. `github://`). It works by passing the urls as arguments or environment variable, for example:
+
+```
+local-ai github://owner/repo/file.yaml@branch
+
+# Env
+MODELS="github://owner/repo/file.yaml@branch,github://owner/repo/file.yaml@branch" local-ai
+
+# Args
+local-ai --models github://owner/repo/file.yaml@branch --models github://owner/repo/file.yaml@branch
+```
+
+For example, to start localai with phi-2, it's possible for instance to also use a full config file from gists:
+
+```bash
+./local-ai https://gist.githubusercontent.com/mudler/ad601a0488b497b69ec549150d9edd18/raw/a8a8869ef1bb7e3830bf5c0bae29a0cce991ff8d/phi-2.yaml
+```
+
+The file should be a valid YAML configuration file, for the full syntax see [advanced]({{%relref "advanced" %}}).
+{{% /notice %}}
+
 ### Container images

 LocalAI has a set of images to support CUDA, ffmpeg and 'vanilla' (CPU-only). The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=tags):
@ -131,6 +204,11 @@ Core Images - Smaller images without predownload python dependencies
 {{% /tab %}}

 {{% tab name="GPU Images CUDA 11" %}}
+
+Images with Nvidia accelleration support
+
+> If you do not know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version`
+
 - `master-cublas-cuda11`
 - `master-cublas-cuda11-core`
 - `{{< version >}}-cublas-cuda11`
@ -142,6 +220,11 @@ Core Images - Smaller images without predownload python dependencies
 {{% /tab %}}

 {{% tab name="GPU Images CUDA 12" %}}
+
+Images with Nvidia accelleration support
+
+> If you do not know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version`
+
 - `master-cublas-cuda12`
 - `master-cublas-cuda12-core`
 - `{{< version >}}-cublas-cuda12`
@ -357,10 +440,6 @@ affinity: {}
 </details>


-### Build from source
-
-See the [build section]({{%relref "build" %}}).
-
 ### Other examples

 ![Screenshot from 2023-04-26 23-59-55](https://user-images.githubusercontent.com/2420543/234715439-98d12e03-d3ce-4f94-ab54-2b256808e05e.png)
--- a/docs/content/model-compatibility/diffusers.md
+++ b/docs/content/model-compatibility/diffusers.md
@ -167,11 +167,6 @@ curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/

 ## img2vid

-{{% notice note %}}
-
-Experimental and available only on master builds. See: https://github.com/mudler/LocalAI/pull/1442
-
-{{% /notice %}}

 ```yaml
 name: img2vid
@ -193,12 +188,6 @@ curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/

 ## txt2vid

-{{% notice note %}}
-
-Experimental and available only on master builds. See: https://github.com/mudler/LocalAI/pull/1442
-
-{{% /notice %}}
-
 ```yaml
 name: txt2vid
 parameters: