feat(backends): Drop bert.cpp (#4272)

* feat(backends): Drop bert.cpp use llama.cpp 3.2 as a drop-in replacement for bert.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(tests): make test more robust Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 10:35:01 +00:00 · 2024-11-27 16:34:28 +01:00 · 2024-11-27 16:34:28 +01:00 · 3c3050f68e
commit 3c3050f68e
parent 1688ba7f2a
13 changed files with 40 additions and 184 deletions
--- a/docs/content/docs/features/embeddings.md
+++ b/docs/content/docs/features/embeddings.md
@ -27,39 +27,6 @@ embeddings: true
 # .. other parameters
 ```

-## Bert embeddings
-
-To use `bert.cpp` models you can use the `bert` embedding backend.
-
-An example model config file:
-
-```yaml
-name: text-embedding-ada-002
-parameters:
-  model: bert
-backend: bert-embeddings
-embeddings: true
-# .. other parameters
-```
-
-The `bert` backend uses [bert.cpp](https://github.com/skeskinen/bert.cpp) and uses `ggml` models.
-
-For instance you can download the `ggml` quantized version of `all-MiniLM-L6-v2` from https://huggingface.co/skeskinen/ggml:
-
-```bash
-wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
-```
-
-To test locally (LocalAI server running on `localhost`),
-you can use `curl` (and `jq` at the end to prettify):
-
-```bash
-curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
-  "input": "Your text string goes here",
-  "model": "text-embedding-ada-002"
-}' | jq "."
-```
-
 ## Huggingface embeddings

 To use `sentence-transformers` and models in `huggingface` you can use the `sentencetransformers` embedding backend.
@ -87,17 +54,26 @@ The `sentencetransformers` backend uses Python [sentence-transformers](https://g

 ## Llama.cpp embeddings

-Embeddings with `llama.cpp` are supported with the `llama` backend.
+Embeddings with `llama.cpp` are supported with the `llama-cpp` backend, it needs to be enabled with `embeddings` set to `true`.

 ```yaml
 name: my-awesome-model
-backend: llama
+backend: llama-cpp
 embeddings: true
 parameters:
  model: ggml-file.bin
 # ...
 ```

+Then you can use the API to generate embeddings:
+
+```bash
+curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
+  "input": "My text",
+  "model": "my-awesome-model"
+}' | jq "."
+```
+
 ## 💡 Examples

 - Example that uses LLamaIndex and LocalAI as embedding: [here](https://github.com/go-skynet/LocalAI/tree/master/examples/query_data/).
--- a/docs/content/docs/features/model-gallery.md
+++ b/docs/content/docs/features/model-gallery.md
@ -300,7 +300,7 @@ curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{

 ```bash
 curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
-     "url": "github:mudler/LocalAI/gallery/bert-embeddings.yaml",
+     "id": "bert-embeddings",
     "name": "text-embedding-ada-002"
   }'  
 ```