LocalAI/docs/content/docs/getting-started/quickstart.md at 74492a81c70603547a717e45d00d532cd2017244

mirror of https://github.com/mudler/LocalAI.git synced 2025-05-21 19:15:00 +00:00

Ettore Di Giacinto 74492a81c7

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

2024-04-07 11:06:35 +02:00

10 KiB

Raw Blame History

+++ disableToc = false title = "Quickstart" weight = 3 url = '/basics/getting_started/' icon = "rocket_launch"

+++

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing. It allows you to run [LLMs]({{%relref "docs/features/text-generation" %}}), generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures.

LocalAI is available as a container image and binary, compatible with various container engines like Docker, Podman, and Kubernetes. Container images are published on quay.io and Docker Hub. Binaries can be downloaded from GitHub.

Prerequisites

Before you begin, ensure you have a container engine installed if you are not using the binaries. Suitable options include Docker or Podman. For installation instructions, refer to the following guides:

Hardware Requirements: The hardware requirements for LocalAI vary based on the model size and quantization method used. For performance benchmarks with different backends, such as llama.cpp, visit this link. The rwkv backend is noted for its lower resource consumption.

Running LocalAI with All-in-One (AIO) Images

Do you have already a model file? Skip to [Run models manually]({{%relref "docs/getting-started/manual" %}}) or [Run other models]({{%relref "docs/getting-started/run-other-models" %}}) to use an already-configured model.

LocalAI's All-in-One (AIO) images are pre-configured with a set of models and backends to fully leverage almost all the LocalAI featureset.

These images are available for both CPU and GPU environments. The AIO images are designed to be easy to use and requires no configuration.

It suggested to use the AIO images if you don't want to configure the models to run on LocalAI. If you want to run specific models, you can use the [manual method]({{%relref "docs/getting-started/manual" %}}).

The AIO Images comes pre-configured with the following features:

Text to Speech (TTS)
Speech to Text
Function calling
Large Language Models (LLM) for text generation
Image generation
Embedding server

Start the image with Docker:

docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu
# For Nvidia GPUs:
# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-11
# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-12

Or with a docker-compose file:

version: "3.9"
services:
  api:
    image: localai/localai:latest-aio-cpu
    # For a specific version:
    # image: localai/localai:{{< version >}}-aio-cpu
    # For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
    # image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-11
    # image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-12
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-11
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      # ...
    volumes:
      - ./models:/build/models:cached
    # decomment the following piece if running with Nvidia GPUs
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

For a list of all the container-images available, see [Container images]({{%relref "docs/reference/container-images" %}}). To learn more about All-in-one images instead, see [All-in-one Images]({{%relref "docs/reference/aio-images" %}}).

Models caching: The AIO image will download the needed models on the first run if not already present and store those in /build/models inside the container. The AIO models will be automatically updated with new versions of AIO images.

You can change the directory inside the container by specifying a MODELS_PATH environment variable (or --models-path).

If you want to use a named model or a local directory, you can mount it as a volume to /build/models:

docker run -p 8080:8080 --name local-ai -ti -v $PWD/models:/build/models localai/localai:latest-aio-cpu

or associate a volume:

docker volume create localai-models
docker run -p 8080:8080 --name local-ai -ti -v localai-models:/build/models localai/localai:latest-aio-cpu

Try it out

LocalAI does not ship a webui by default, however you can use 3rd party projects to interact with it (see also [Integrations]({{%relref "docs/integrations" %}}) ). However, you can test out the API endpoints using curl, you can find few examples below.

Text Generation

Creates a model response for the given chat conversation. OpenAI documentation.

curl http://localhost:8080/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'

GPT Vision

Understand images.

curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{ 
        "model": "gpt-4-vision-preview", 
        "messages": [
          {
            "role": "user", "content": [
              {"type":"text", "text": "What is in the image?"},
              {
                "type": "image_url", 
                "image_url": {
                  "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" 
                }
              }
            ], 
          "temperature": 0.9
          }
        ]
      }'

Function calling

Call functions

curl https://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in Boston?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Image Generation

Creates an image given a prompt. OpenAI documentation.

curl http://localhost:8080/v1/images/generations \
      -H "Content-Type: application/json" -d '{
          "prompt": "A cute baby sea otter",
          "size": "256x256"
        }'

Text to speech

Generates audio from the input text. OpenAI documentation.

curl http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

Audio Transcription

Transcribes audio into the input language. OpenAI Documentation.

Download first a sample to transcribe:

wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg

Send the example audio file to the transcriptions endpoint :

curl http://localhost:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@$PWD/gb1.ogg" -F model="whisper-1"

Embeddings Generation

Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms. OpenAI Embeddings.

curl http://localhost:8080/embeddings \
    -X POST -H "Content-Type: application/json" \
    -d '{ 
        "input": "Your text string goes here", 
        "model": "text-embedding-ada-002"
      }'

Don't use the model file as model in the request unless you want to handle the prompt template for yourself.

Use the model names like you would do with OpenAI like in the examples below. For instance gpt-4-vision-preview, or gpt-4.

What's next?

There is much more to explore! run any model from huggingface, video generation, and voice cloning with LocalAI, check out the [features]({{%relref "docs/features" %}}) section for a full overview.

Explore further resources and community contributions:

[Build LocalAI and the container image]({{%relref "docs/getting-started/build" %}})
[Run models manually]({{%relref "docs/getting-started/manual" %}})
[Run other models]({{%relref "docs/getting-started/run-other-models" %}})
[Container images]({{%relref "docs/reference/container-images" %}})
[All-in-one Images]({{%relref "docs/reference/aio-images" %}})
Examples

10 KiB Raw Blame History