mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-20 02:24:59 +00:00
docs: Initial import from localai-website (#1312)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
parent
763f94ca80
commit
c5c77d2b0d
66 changed files with 6111 additions and 0 deletions
81
docs/content/model-compatibility/llama-cpp.md
Normal file
81
docs/content/model-compatibility/llama-cpp.md
Normal file
|
@ -0,0 +1,81 @@
|
|||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "🦙 llama.cpp"
|
||||
weight = 1
|
||||
+++
|
||||
|
||||
[llama.cpp](https://github.com/ggerganov/llama.cpp) is a popular port of Facebook's LLaMA model in C/C++.
|
||||
|
||||
{{% notice note %}}
|
||||
|
||||
The `ggml` file format has been deprecated. If you are using `ggml` models and you are configuring your model with a YAML file, specify, use the `llama-stable` backend instead. If you are relying in automatic detection of the model, you should be fine. For `gguf` models, use the `llama` backend.
|
||||
|
||||
{{% /notice %}}
|
||||
|
||||
## Features
|
||||
|
||||
The `llama.cpp` model supports the following features:
|
||||
- [📖 Text generation (GPT)]({{%relref "features/text-generation" %}})
|
||||
- [🧠 Embeddings]({{%relref "features/embeddings" %}})
|
||||
- [🔥 OpenAI functions]({{%relref "features/openai-functions" %}})
|
||||
- [✍️ Constrained grammars]({{%relref "features/constrained_grammars" %}})
|
||||
|
||||
## Setup
|
||||
|
||||
LocalAI supports `llama.cpp` models out of the box. You can use the `llama.cpp` model in the same way as any other model.
|
||||
|
||||
### Manual setup
|
||||
|
||||
It is sufficient to copy the `ggml` or `guf` model files in the `models` folder. You can refer to the model in the `model` parameter in the API calls.
|
||||
|
||||
[You can optionally create an associated YAML]({{%relref "advanced" %}}) model config file to tune the model's parameters or apply a template to the prompt.
|
||||
|
||||
Prompt templates are useful for models that are fine-tuned towards a specific prompt.
|
||||
|
||||
### Automatic setup
|
||||
|
||||
LocalAI supports model galleries which are indexes of models. For instance, the huggingface gallery contains a large curated index of models from the huggingface model hub for `ggml` or `gguf` models.
|
||||
|
||||
For instance, if you have the galleries enabled, you can just start chatting with models in huggingface by running:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "TheBloke/WizardLM-13B-V1.2-GGML/wizardlm-13b-v1.2.ggmlv3.q2_K.bin",
|
||||
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
||||
"temperature": 0.1
|
||||
}'
|
||||
```
|
||||
|
||||
LocalAI will automatically download and configure the model in the `model` directory.
|
||||
|
||||
Models can be also preloaded or downloaded on demand. To learn about model galleries, check out the [model gallery documentation]({{%relref "models" %}}).
|
||||
|
||||
### YAML configuration
|
||||
|
||||
To use the `llama.cpp` backend, specify `llama` as the backend in the YAML file:
|
||||
|
||||
```yaml
|
||||
name: llama
|
||||
backend: llama
|
||||
parameters:
|
||||
# Relative to the models path
|
||||
model: file.gguf.bin
|
||||
```
|
||||
|
||||
In the example above we specify `llama` as the backend to restrict loading `gguf` models only.
|
||||
|
||||
For instance, to use the `llama-stable` backend for `ggml` models:
|
||||
|
||||
```yaml
|
||||
name: llama
|
||||
backend: llama-stable
|
||||
parameters:
|
||||
# Relative to the models path
|
||||
model: file.ggml.bin
|
||||
```
|
||||
|
||||
### Reference
|
||||
|
||||
- [llama](https://github.com/ggerganov/llama.cpp)
|
||||
- [binding](https://github.com/go-skynet/go-llama.cpp)
|
Loading…
Add table
Add a link
Reference in a new issue