mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-20 10:35:01 +00:00
feat(backends): Drop bert.cpp (#4272)
* feat(backends): Drop bert.cpp use llama.cpp 3.2 as a drop-in replacement for bert.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(tests): make test more robust Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
parent
1688ba7f2a
commit
3c3050f68e
13 changed files with 40 additions and 184 deletions
|
@ -27,39 +27,6 @@ embeddings: true
|
|||
# .. other parameters
|
||||
```
|
||||
|
||||
## Bert embeddings
|
||||
|
||||
To use `bert.cpp` models you can use the `bert` embedding backend.
|
||||
|
||||
An example model config file:
|
||||
|
||||
```yaml
|
||||
name: text-embedding-ada-002
|
||||
parameters:
|
||||
model: bert
|
||||
backend: bert-embeddings
|
||||
embeddings: true
|
||||
# .. other parameters
|
||||
```
|
||||
|
||||
The `bert` backend uses [bert.cpp](https://github.com/skeskinen/bert.cpp) and uses `ggml` models.
|
||||
|
||||
For instance you can download the `ggml` quantized version of `all-MiniLM-L6-v2` from https://huggingface.co/skeskinen/ggml:
|
||||
|
||||
```bash
|
||||
wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
|
||||
```
|
||||
|
||||
To test locally (LocalAI server running on `localhost`),
|
||||
you can use `curl` (and `jq` at the end to prettify):
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
|
||||
"input": "Your text string goes here",
|
||||
"model": "text-embedding-ada-002"
|
||||
}' | jq "."
|
||||
```
|
||||
|
||||
## Huggingface embeddings
|
||||
|
||||
To use `sentence-transformers` and models in `huggingface` you can use the `sentencetransformers` embedding backend.
|
||||
|
@ -87,17 +54,26 @@ The `sentencetransformers` backend uses Python [sentence-transformers](https://g
|
|||
|
||||
## Llama.cpp embeddings
|
||||
|
||||
Embeddings with `llama.cpp` are supported with the `llama` backend.
|
||||
Embeddings with `llama.cpp` are supported with the `llama-cpp` backend, it needs to be enabled with `embeddings` set to `true`.
|
||||
|
||||
```yaml
|
||||
name: my-awesome-model
|
||||
backend: llama
|
||||
backend: llama-cpp
|
||||
embeddings: true
|
||||
parameters:
|
||||
model: ggml-file.bin
|
||||
# ...
|
||||
```
|
||||
|
||||
Then you can use the API to generate embeddings:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
|
||||
"input": "My text",
|
||||
"model": "my-awesome-model"
|
||||
}' | jq "."
|
||||
```
|
||||
|
||||
## 💡 Examples
|
||||
|
||||
- Example that uses LLamaIndex and LocalAI as embedding: [here](https://github.com/go-skynet/LocalAI/tree/master/examples/query_data/).
|
||||
|
|
|
@ -300,7 +300,7 @@ curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
|
|||
|
||||
```bash
|
||||
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
|
||||
"url": "github:mudler/LocalAI/gallery/bert-embeddings.yaml",
|
||||
"id": "bert-embeddings",
|
||||
"name": "text-embedding-ada-002"
|
||||
}'
|
||||
```
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue