deps(llama.cpp): bump to latest, update build variables (#2669)

* arrow_up: Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * deps(llama.cpp): update build variables to follow upstream Update build recipes with https://github.com/ggerganov/llama.cpp/pull/8006 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable shared libs by default in llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable shared libs in llama.cpp Makefile Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable metal embedding for now, until it is tested Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(mac): explicitly enable metal Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * debug Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix typo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-20 10:35:01 +00:00 · 2024-06-27 23:10:04 +02:00 · 2024-06-27 23:10:04 +02:00 · 7b1e792732
commit 7b1e792732
parent 30b883affe
9 changed files with 39 additions and 32 deletions
--- a/docs/content/docs/advanced/fine-tuning.md
+++ b/docs/content/docs/advanced/fine-tuning.md
@ -118,7 +118,7 @@ And we convert it to the gguf format that LocalAI can consume:

 # Convert to gguf
 git clone https://github.com/ggerganov/llama.cpp.git
-pushd llama.cpp && make LLAMA_CUBLAS=1 && popd
+pushd llama.cpp && make GGML_CUDA=1 && popd

 # We need to convert the pytorch model into ggml for quantization
 # It crates 'ggml-model-f16.bin' in the 'merged' directory.
--- a/docs/content/docs/faq.md
+++ b/docs/content/docs/faq.md
@ -55,4 +55,4 @@ This typically happens when your prompt exceeds the context size. Try to reduce

 ### I'm getting a 'SIGILL' error, what's wrong?

-Your CPU probably does not have support for certain instructions that are compiled by default in the pre-built binaries. If you are running in a container, try setting `REBUILD=true` and disable the CPU instructions that are not compatible with your CPU. For instance: `CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make build`
+Your CPU probably does not have support for certain instructions that are compiled by default in the pre-built binaries. If you are running in a container, try setting `REBUILD=true` and disable the CPU instructions that are not compatible with your CPU. For instance: `CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make build`
--- a/docs/content/docs/getting-started/build.md
+++ b/docs/content/docs/getting-started/build.md
@ -101,14 +101,14 @@ Here is the list of the variables available that can be used to customize the bu
 LocalAI uses different backends based on ggml and llama.cpp to run models. If your CPU doesn't support common instruction sets, you can disable them during build:

 ```
-CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF" make build
+CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF" make build
 ```

 To have effect on the container image, you need to set `REBUILD=true`:

 ```
 docker run  quay.io/go-skynet/localai
-docker run --rm -ti -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -e REBUILD=true -e CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF" -v $PWD/models:/models quay.io/go-skynet/local-ai:latest
+docker run --rm -ti -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -e REBUILD=true -e CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF" -v $PWD/models:/models quay.io/go-skynet/local-ai:latest
 ```

 {{% /alert %}}