mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-20 10:35:01 +00:00
deps(llama.cpp): bump to latest, update build variables (#2669)
* arrow_up: Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * deps(llama.cpp): update build variables to follow upstream Update build recipes with https://github.com/ggerganov/llama.cpp/pull/8006 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable shared libs by default in llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable shared libs in llama.cpp Makefile Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable metal embedding for now, until it is tested Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(mac): explicitly enable metal Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * debug Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix typo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
This commit is contained in:
parent
30b883affe
commit
7b1e792732
9 changed files with 39 additions and 32 deletions
|
@ -118,7 +118,7 @@ And we convert it to the gguf format that LocalAI can consume:
|
|||
|
||||
# Convert to gguf
|
||||
git clone https://github.com/ggerganov/llama.cpp.git
|
||||
pushd llama.cpp && make LLAMA_CUBLAS=1 && popd
|
||||
pushd llama.cpp && make GGML_CUDA=1 && popd
|
||||
|
||||
# We need to convert the pytorch model into ggml for quantization
|
||||
# It crates 'ggml-model-f16.bin' in the 'merged' directory.
|
||||
|
|
|
@ -55,4 +55,4 @@ This typically happens when your prompt exceeds the context size. Try to reduce
|
|||
|
||||
### I'm getting a 'SIGILL' error, what's wrong?
|
||||
|
||||
Your CPU probably does not have support for certain instructions that are compiled by default in the pre-built binaries. If you are running in a container, try setting `REBUILD=true` and disable the CPU instructions that are not compatible with your CPU. For instance: `CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make build`
|
||||
Your CPU probably does not have support for certain instructions that are compiled by default in the pre-built binaries. If you are running in a container, try setting `REBUILD=true` and disable the CPU instructions that are not compatible with your CPU. For instance: `CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make build`
|
|
@ -101,14 +101,14 @@ Here is the list of the variables available that can be used to customize the bu
|
|||
LocalAI uses different backends based on ggml and llama.cpp to run models. If your CPU doesn't support common instruction sets, you can disable them during build:
|
||||
|
||||
```
|
||||
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF" make build
|
||||
CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF" make build
|
||||
```
|
||||
|
||||
To have effect on the container image, you need to set `REBUILD=true`:
|
||||
|
||||
```
|
||||
docker run quay.io/go-skynet/localai
|
||||
docker run --rm -ti -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -e REBUILD=true -e CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF" -v $PWD/models:/models quay.io/go-skynet/local-ai:latest
|
||||
docker run --rm -ti -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -e REBUILD=true -e CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF" -v $PWD/models:/models quay.io/go-skynet/local-ai:latest
|
||||
```
|
||||
|
||||
{{% /alert %}}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue