Compare commits

...

189 commits

Author SHA1 Message Date
Ettore Di Giacinto
04a3d8e5ac
feat(ui): add error page to display errors (#5418)
Some checks are pending
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 12:17:27 +02:00
Ettore Di Giacinto
9af09b3f8c chore(model gallery): fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 12:17:21 +02:00
Ettore Di Giacinto
0d590a4044
chore(model gallery): add smolvlm2-256m-video-instruct (#5417)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 12:03:02 +02:00
Ettore Di Giacinto
e0a54de4f5
chore(model gallery): add smolvlm2-500m-video-instruct (#5416)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 11:42:30 +02:00
Ettore Di Giacinto
6bc2ae5467
chore(model gallery): add smolvlm2-2.2b-instruct (#5415)
chore(model gallery): add smolvlm-instruct

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 11:36:22 +02:00
Ettore Di Giacinto
8caaf49f5d
chore(model gallery): add smolvlm-instruct (#5414)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 11:35:01 +02:00
Ettore Di Giacinto
1db51044bb
chore(model gallery): add smolvlm-500m-instruct (#5413)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 11:25:32 +02:00
Ettore Di Giacinto
ec21b58008
chore(model gallery): add smolvlm-256m-instruct (#5412)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 11:15:09 +02:00
Ettore Di Giacinto
996259b529
chore(model gallery): add facebook_kernelllm (#5411)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 10:31:09 +02:00
Ettore Di Giacinto
f2942cc0e1
chore(model gallery): add thedrummer_valkyrie-49b-v1 (#5410)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-20 10:28:27 +02:00
Ettore Di Giacinto
f8fbfd4fa3
chore(model gallery): add a-m-team_am-thinking-v1 (#5395)
Some checks are pending
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-19 17:31:38 +02:00
Ettore Di Giacinto
41e239c67e
chore(model gallery): add soob3123_grayline-qwen3-8b (#5394)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-19 17:02:43 +02:00
Ettore Di Giacinto
587827e779
chore(model gallery): add soob3123_grayline-qwen3-14b (#5393)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-19 15:59:07 +02:00
LocalAI [bot]
456b4982ef
chore: ⬆️ Update ggml-org/llama.cpp to 6a2bc8bfb7cd502e5ebc72e36c97a6f848c21c2c (#5390)
Some checks are pending
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-19 01:25:22 +00:00
Ettore Di Giacinto
159388cce8
chore: memoize detected GPUs (#5385)
Some checks are pending
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-18 08:55:44 +02:00
LocalAI [bot]
cfc73c7773
chore: ⬆️ Update ggml-org/llama.cpp to e3a7cf6c5bf6a0a24217f88607b06e4405a2b5d9 (#5384)
Some checks are pending
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-18 01:21:13 +00:00
Ettore Di Giacinto
6d5bde860b
feat(llama.cpp): upgrade and use libmtmd (#5379)
Some checks are pending
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
* WIP

* wip

* wip

* Make it compile

* Update json.hpp

* this shouldn't be private for now

* Add logs

* Reset auto detected template

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Re-enable grammars

* This seems to be broken - 360a9c98e1 (diff-a18a8e64e12a01167d8e98fc)[…]cccf0d4eed09d76d879L2998-L3207

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Placeholder

* Simplify image loading

* use completion type

* disable streaming

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* correctly return timings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove some debug logging

* Adapt tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Keep header

* embedding: do not use oai type

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Sync from server.cpp

* Use utils and json directly from llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Sync with upstream

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: copy json.hpp from the correct location

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add httplib

* sync llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Embeddiongs: set OAICOMPAT_TYPE_EMBEDDING

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: sync with server.cpp by including it

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make it darwin-compatible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-17 16:02:53 +02:00
LocalAI [bot]
6ef383033b
chore: ⬆️ Update ggml-org/whisper.cpp to d1f114da61b1ae1e70b03104fad42c9dd666feeb (#5381)
Some checks are pending
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-17 00:35:17 +00:00
Richard Palethorpe
cd494089d9
fix(flux): Set CFG=1 so that prompts are followed (#5378)
The recommendation with Flux is to set CFG to 1 as shown in the
stablediffusion-cpp README.

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-05-16 17:53:54 +02:00
LocalAI [bot]
3033845f94
chore: ⬆️ Update ggml-org/whisper.cpp to 20a20decd94badfd519a07ea91f0bba8b8fc4dea (#5374)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-16 12:46:16 +02:00
omahs
0f365ac204
fix: typos (#5376)
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
2025-05-16 12:45:48 +02:00
Ettore Di Giacinto
525cf198be
chore(model gallery): add primeintellect_intellect-2 (#5373)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-15 10:53:52 +02:00
Ettore Di Giacinto
658c2a4f55
chore(model gallery): add thedrummer_rivermind-lux-12b-v1 (#5372)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-15 10:51:55 +02:00
Ettore Di Giacinto
c987de090d
chore(model gallery): add thedrummer_snowpiercer-15b-v1 (#5371)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-15 10:04:44 +02:00
Ettore Di Giacinto
04365843e6
chore(model gallery): add skywork_skywork-or1-7b (#5370)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-15 10:02:07 +02:00
Ettore Di Giacinto
1dc5781679
chore(model gallery): add skywork_skywork-or1-32b (#5369)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-15 09:58:51 +02:00
LocalAI [bot]
30704292de
chore: ⬆️ Update ggml-org/whisper.cpp to f389d7e3e56bbbfec49fd333551927a0fcbb7213 (#5367)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-15 00:34:16 +00:00
Ettore Di Giacinto
e52c66c76e
chore(docs/install.sh): image changes (#5354)
chore(docs): image changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-14 19:28:30 +02:00
LocalAI [bot]
cb28aef93b
chore: ⬆️ Update ggml-org/whisper.cpp to f89056057511a1657af90bb28ef3f21e5b1f33cd (#5364)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-14 09:24:16 +02:00
LocalAI [bot]
029f97c2a2
docs: ⬆️ update docs version mudler/LocalAI (#5363)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-14 01:54:34 +00:00
Ettore Di Giacinto
3be71be696
fix(ci): tag latest against cpu-only image (#5362)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-13 22:00:41 +02:00
LocalAI [bot]
6adb019f8f
chore: ⬆️ Update ggml-org/llama.cpp to de4c07f93783a1a96456a44dc16b9db538ee1618 (#5358)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-13 22:00:19 +02:00
LocalAI [bot]
fcaa0a2f01
chore: ⬆️ Update ggml-org/whisper.cpp to e41bc5c61ae66af6be2bd7011769bb821a83e8ae (#5357)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-13 21:59:50 +02:00
dependabot[bot]
fd17a3312c
chore(deps): bump securego/gosec from 2.22.3 to 2.22.4 (#5356)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.3 to 2.22.4.
- [Release notes](https://github.com/securego/gosec/releases)
- [Changelog](https://github.com/securego/gosec/blob/master/.goreleaser.yml)
- [Commits](https://github.com/securego/gosec/compare/v2.22.3...v2.22.4)

---
updated-dependencies:
- dependency-name: securego/gosec
  dependency-version: 2.22.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-12 22:01:43 +02:00
dependabot[bot]
12d0fe610b
chore(deps): bump dependabot/fetch-metadata from 2.3.0 to 2.4.0 (#5355)
Bumps [dependabot/fetch-metadata](https://github.com/dependabot/fetch-metadata) from 2.3.0 to 2.4.0.
- [Release notes](https://github.com/dependabot/fetch-metadata/releases)
- [Commits](https://github.com/dependabot/fetch-metadata/compare/v2.3.0...v2.4.0)

---
updated-dependencies:
- dependency-name: dependabot/fetch-metadata
  dependency-version: 2.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-12 22:01:19 +02:00
Ettore Di Giacinto
11c67d16b8
chore(ci): strip 'core' in the image suffix, identify python-based images with 'extras' (#5353)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-12 09:36:59 +02:00
LocalAI [bot]
63f7c86c4d
chore: ⬆️ Update ggml-org/llama.cpp to 9a390c4829cd3058d26a2e2c09d16e3fd12bf1b1 (#5351)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-12 09:24:54 +02:00
LocalAI [bot]
ac89bf77bf
chore: ⬆️ Update ggml-org/whisper.cpp to 2e310b841e0b4e7cf00890b53411dd9f8578f243 (#4785)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-12 01:30:35 +00:00
Ettore Di Giacinto
0395cc02fb
chore(model gallery): add qwen_qwen2.5-vl-72b-instruct (#5349)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-11 09:46:32 +02:00
Ettore Di Giacinto
616972fca0
chore(model gallery): add qwen_qwen2.5-vl-7b-instruct (#5348)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-11 09:44:58 +02:00
Ettore Di Giacinto
942fbff62d
chore(model gallery): add gryphe_pantheon-proto-rp-1.8-30b-a3b (#5347)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-11 09:39:28 +02:00
LocalAI [bot]
2612a0c910
chore: ⬆️ Update ggml-org/llama.cpp to 15e6125a397f6086c1dfdf7584acdb7c730313dc (#5345)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-11 09:21:46 +02:00
LocalAI [bot]
2dcb6d7247
chore(model-gallery): ⬆️ update checksum (#5346)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-10 22:24:04 +02:00
Ettore Di Giacinto
6978eec69f
feat(whisper.cpp): gpu support (#5344)
* fix(whisper.cpp): gpu support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to fix apple tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-10 22:02:40 +02:00
LocalAI [bot]
2fcfe54466
chore: ⬆️ Update ggml-org/llama.cpp to 33eff4024084d1f0c8441b79f7208a52fad79858 (#5343)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-10 10:07:39 +02:00
Ettore Di Giacinto
4e7506a3be
fix(whisper): add vulkan flag
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-05-10 08:46:21 +02:00
Ettore Di Giacinto
2a46217f90
Update Makefile
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-05-09 23:17:18 +02:00
Ettore Di Giacinto
31ff9dbd52 chore(Makefile): small cleanups, disable openmp on whisper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-09 22:37:18 +02:00
Ettore Di Giacinto
9483abef03 fix(whisper/sycl): disable
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-09 22:36:09 +02:00
Ettore Di Giacinto
ce3e8b3e31 fix(whisper/sycl): use icx when running go build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-09 21:48:09 +02:00
Ettore Di Giacinto
f3bb84c9a7 feat(whisper): link vulkan, hipblas and sycl
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-09 19:25:26 +02:00
Ettore Di Giacinto
ecb1297582 fix: specify icx and icpx only on whisper.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-09 10:58:30 +02:00
Ettore Di Giacinto
73fc702b3c fix: this is not needed
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-09 10:28:53 +02:00
Ettore Di Giacinto
e3af62ae1a
feat: Add sycl support for whisper.cpp (#5341) 2025-05-09 09:31:02 +02:00
Ettore Di Giacinto
dc21604741
chore(deps): bump whisper.cpp (#5338)
* chore(deps): bump whisper.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add libggml-metal

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups macOS arm64

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* adjust cublas for whisper.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-09 08:17:45 +02:00
LocalAI [bot]
5433f1a70e
chore: ⬆️ Update ggml-org/llama.cpp to f05a6d71a0f3dbf0730b56a1abbad41c0f42e63d (#5340)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-08 23:13:28 +00:00
Ettore Di Giacinto
d5e032bdcd
chore(model gallery): add gemma-3-12b-fornaxv.2-qat-cot (#5337)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-08 12:07:25 +02:00
Ettore Di Giacinto
de786f6586
chore(model gallery): add symiotic-14b-i1 (#5336)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-08 12:03:35 +02:00
Ettore Di Giacinto
8b9bc4aa6e
chore(model gallery): add qwen3-14b-uncensored (#5335)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-08 11:59:26 +02:00
Ettore Di Giacinto
e6cea7d28e
chore(model gallery): add cognition-ai_kevin-32b (#5334)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-08 11:57:12 +02:00
Ettore Di Giacinto
7d7d56f2ce
chore(model gallery): add servicenow-ai_apriel-nemotron-15b-thinker (#5333)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-08 11:55:35 +02:00
Ettore Di Giacinto
1caae91ab6
chore(model gallery): add qwen3-4b-esper3-i1 (#5332)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-08 11:52:02 +02:00
LocalAI [bot]
e90f2cb0ca
chore: ⬆️ Update ggml-org/llama.cpp to 814f795e063c257f33b921eab4073484238a151a (#5331)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-08 09:25:13 +02:00
Ettore Di Giacinto
5a4291fadd
docs: update README badges
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-05-07 22:20:06 +02:00
Ettore Di Giacinto
91ef58ee5a
chore(model gallery): add qwen3-14b-griffon-i1 (#5330)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-07 11:07:38 +02:00
LocalAI [bot]
a86e8c78f1
chore: ⬆️ Update ggml-org/llama.cpp to 91a86a6f354aa73a7aab7bc3d283be410fdc93a5 (#5329)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-06 23:39:10 +00:00
Ettore Di Giacinto
adb24214c6
chore(deps): bump llama.cpp to b34c859146630dff136943abc9852ca173a7c9d6 (#5323)
chore(deps): bump llama.cpp to 'b34c859146630dff136943abc9852ca173a7c9d6'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-06 11:21:25 +02:00
Ettore Di Giacinto
f03a0430aa
chore(model gallery): add claria-14b (#5326)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-06 10:48:03 +02:00
Ettore Di Giacinto
73bc12abc0
chore(model gallery): add goekdeniz-guelmez_josiefied-qwen3-8b-abliterated-v1 (#5325)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-06 10:38:20 +02:00
Ettore Di Giacinto
7fa437bbcc
chore(model gallery): add huihui-ai_qwen3-14b-abliterated (#5324)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-06 10:35:55 +02:00
LocalAI [bot]
4a27c99928
chore(model-gallery): ⬆️ update checksum (#5321)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-06 10:01:28 +02:00
Ettore Di Giacinto
6ce94834b6
fix(hipblas): do not build all cpu-specific flags (#5322)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-06 10:00:50 +02:00
dependabot[bot]
84a26458dc
chore(deps): bump mxschmitt/action-tmate from 3.21 to 3.22 (#5319)
Bumps [mxschmitt/action-tmate](https://github.com/mxschmitt/action-tmate) from 3.21 to 3.22.
- [Release notes](https://github.com/mxschmitt/action-tmate/releases)
- [Changelog](https://github.com/mxschmitt/action-tmate/blob/master/RELEASE.md)
- [Commits](https://github.com/mxschmitt/action-tmate/compare/v3.21...v3.22)

---
updated-dependencies:
- dependency-name: mxschmitt/action-tmate
  dependency-version: '3.22'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-05 22:17:59 +00:00
Ettore Di Giacinto
7aa377b6a9
fix(arm64): do not build instructions which are not available (#5318)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-05 17:30:00 +02:00
Ettore Di Giacinto
64e66dda4a
chore(model gallery): add allura-org_remnant-qwen3-8b (#5317)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-05 11:09:07 +02:00
LocalAI [bot]
a085f61fdc
chore: ⬆️ Update ggml-org/llama.cpp to 9fdfcdaeddd1ef57c6d041b89cd8fb7048a0f028 (#5316)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-04 23:00:25 +00:00
Ettore Di Giacinto
21bdfe5fa4
fix: use rice when embedding large binaries (#5309)
* fix(embed): use go-rice for large backend assets

Golang embed FS has a hard limit that we might exceed when providing
many binary alternatives.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* simplify golang deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): switch to testcontainers and print logs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(tests): do not build a test binary

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* small fixup

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-04 16:42:42 +02:00
Ettore Di Giacinto
7ebd7b2454
chore(model gallery): add rei-v3-kto-12b (#5313)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-04 09:41:35 +02:00
Ettore Di Giacinto
6984749ea1
chore(model gallery): add kalomaze_qwen3-16b-a3b (#5312)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-04 09:39:38 +02:00
Ettore Di Giacinto
c0a206bc7a
chore(model gallery): add qwen3-30b-a1.5b-high-speed (#5311)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-04 09:38:01 +02:00
LocalAI [bot]
01bbb31fb3
chore: ⬆️ Update ggml-org/llama.cpp to 36667c8edcded08063ed51c7d57e9e086bbfc903 (#5300)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-04 09:23:01 +02:00
Ettore Di Giacinto
72111c597d
fix(gpu): do not assume gpu being returned has node and mem (#5310)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 19:00:24 +02:00
Ettore Di Giacinto
b2f9fc870b
chore(defaults): enlarge defaults, drop gpu layers which is infered (#5308)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 18:44:51 +02:00
Ettore Di Giacinto
1fc6d469ac
chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194 (#5307)
chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 18:44:40 +02:00
Ettore Di Giacinto
05848b2027
chore(model gallery): add smoothie-qwen3-8b (#5306)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 10:35:20 +02:00
Ettore Di Giacinto
1da0644aa3
chore(model gallery): add qwen-3-32b-medical-reasoning-i1 (#5305)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 10:24:07 +02:00
Ettore Di Giacinto
c087cd1377
chore(model gallery): add amoral-qwen3-14b (#5304)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 10:21:48 +02:00
Ettore Di Giacinto
c621412f6a
chore(model gallery): add comet_12b_v.5-i1 (#5303)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 10:20:03 +02:00
Ettore Di Giacinto
5a8b1892cd
chore(model gallery): add genericrpv3-4b (#5302)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 10:18:31 +02:00
Ettore Di Giacinto
5b20426863
chore(model gallery): add planetoid_27b_v.2 (#5301)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 10:14:33 +02:00
Ettore Di Giacinto
5c6cd50ed6
feat(llama.cpp): estimate vram usage (#5299)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-02 17:40:26 +02:00
Ettore Di Giacinto
bace6516f1
chore(model gallery): add webthinker-qwq-32b-i1 (#5298)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-02 09:57:49 +02:00
Ettore Di Giacinto
3baadf6f27
chore(model gallery): add shuttleai_shuttle-3.5 (#5297)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-02 09:48:11 +02:00
Ettore Di Giacinto
8804c701b8
chore(model gallery): add microsoft_phi-4-reasoning (#5296)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-02 09:46:20 +02:00
Ettore Di Giacinto
7b3ceb19bb
chore(model gallery): add microsoft_phi-4-reasoning-plus (#5295)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-02 09:43:38 +02:00
Ettore Di Giacinto
e7f3effea1
chore(model gallery): add furina-8b (#5294)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-02 09:39:22 +02:00
Ettore Di Giacinto
61694a2ffb
chore(model gallery): add josiefied-qwen3-8b-abliterated-v1 (#5293)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-02 09:36:35 +02:00
LocalAI [bot]
573a3f104c
chore: ⬆️ Update ggml-org/llama.cpp to d7a14c42a1883a34a6553cbfe30da1e1b84dfd6a (#5292)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-02 09:21:38 +02:00
Ettore Di Giacinto
0e8af53a5b chore: update quickstart
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-01 22:36:33 +02:00
Ettore Di Giacinto
960ffa808c
chore(model gallery): add microsoft_phi-4-mini-reasoning (#5288)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-01 10:17:58 +02:00
Ettore Di Giacinto
92719568e5
chore(model gallery): add fast-math-qwen3-14b (#5287)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-01 10:14:51 +02:00
Ettore Di Giacinto
163939af71
chore(model gallery): add qwen3-8b-jailbroken (#5286)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-01 10:13:01 +02:00
Ettore Di Giacinto
399f1241dc
chore(model gallery): add qwen3-30b-a3b-abliterated (#5285)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-01 10:07:42 +02:00
LocalAI [bot]
58c9ade2e8
chore: ⬆️ Update ggml-org/llama.cpp to 3e168bede4d27b35656ab8026015b87659ecbec2 (#5284)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-05-01 10:01:39 +02:00
Ettore Di Giacinto
6e1c93d84f
fix(ci): comment out vllm tests
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-05-01 10:01:22 +02:00
Wyatt Neal
4076ea0494
fix: vllm missing logprobs (#5279)
* working to address missing items

referencing #3436, #2930 - if i could test it, this might show that the
output from the vllm backend is processed and returned to the user

Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>

* adding in vllm tests to test-extras

Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>

* adding in tests to pipeline for execution

Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>

* removing todo block, test via pipeline

Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>

---------

Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>
2025-04-30 12:55:07 +00:00
Ettore Di Giacinto
26cbf77c0d
chore(model gallery): add mlabonne_qwen3-4b-abliterated (#5283)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-30 11:09:58 +02:00
Ettore Di Giacinto
640790d628
chore(model gallery): add mlabonne_qwen3-8b-abliterated (#5282)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-30 11:08:26 +02:00
Ettore Di Giacinto
4132adea2f
chore(model gallery): add mlabonne_qwen3-14b-abliterated (#5281)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-30 11:04:49 +02:00
LocalAI [bot]
2b2d907a3a
chore: ⬆️ Update ggml-org/llama.cpp to e2e1ddb93a01ce282e304431b37e60b3cddb6114 (#5278)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-29 21:46:08 +00:00
Ettore Di Giacinto
6e8f4f584b
fix(diffusers): consider options only in form of key/value (#5277)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-29 17:08:55 +02:00
Richard Palethorpe
662cfc2b48
fix(aio): Fix copypasta in download files for gpt-4 model (#5276)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-29 17:08:16 +02:00
Ettore Di Giacinto
a25d355d66
chore(model gallery): add qwen3-0.6b (#5275)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-29 10:10:16 +02:00
Ettore Di Giacinto
6d1cfdbefc
chore(model gallery): add qwen3-1.7b (#5274)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-29 10:06:03 +02:00
Ettore Di Giacinto
5ecc478968
chore(model gallery): add qwen3-4b (#5273)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-29 10:01:22 +02:00
Ettore Di Giacinto
aef5c4291b
chore(model gallery): add qwen3-8b (#5272)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-29 09:59:17 +02:00
Ettore Di Giacinto
c059f912b9
chore(model gallery): add qwen3-14b (#5271)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-29 09:56:50 +02:00
LocalAI [bot]
bc1e059259
chore: ⬆️ Update ggml-org/llama.cpp to 5f5e39e1ba5dbea814e41f2a15e035d749a520bc (#5267)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-29 09:49:42 +02:00
LocalAI [bot]
38dc07793a
chore(model-gallery): ⬆️ update checksum (#5268)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-29 09:49:23 +02:00
Ettore Di Giacinto
da6ef0967d
chore(model gallery): add qwen3-32b (#5270)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-29 09:48:28 +02:00
Ettore Di Giacinto
7a011e60bd
chore(model gallery): add qwen3-30b-a3b (#5269)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-29 09:44:44 +02:00
dependabot[bot]
e13dd5b09f
chore(deps): bump appleboy/scp-action from 0.1.7 to 1.0.0 (#5265)
Bumps [appleboy/scp-action](https://github.com/appleboy/scp-action) from 0.1.7 to 1.0.0.
- [Release notes](https://github.com/appleboy/scp-action/releases)
- [Changelog](https://github.com/appleboy/scp-action/blob/master/.goreleaser.yaml)
- [Commits](https://github.com/appleboy/scp-action/compare/v0.1.7...v1.0.0)

---
updated-dependencies:
- dependency-name: appleboy/scp-action
  dependency-version: 1.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-28 22:36:30 +00:00
Ettore Di Giacinto
86ee303bd6
chore(model gallery): add nvidia_openmath-nemotron-14b-kaggle (#5264)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-28 19:52:36 +02:00
Ettore Di Giacinto
978ee96fd3
chore(model gallery): add nvidia_openmath-nemotron-14b (#5263)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-28 19:43:49 +02:00
Ettore Di Giacinto
3ad5691db6
chore(model gallery): add nvidia_openmath-nemotron-7b (#5262)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-28 19:41:59 +02:00
Ettore Di Giacinto
0027681090
chore(model gallery): add nvidia_openmath-nemotron-1.5b (#5261)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-28 19:40:09 +02:00
Ettore Di Giacinto
8cba990edc
chore(model gallery): add nvidia_openmath-nemotron-32b (#5260)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-28 19:36:57 +02:00
Simon Redman
88857696d4
fix(CUDA): Add note for how to run CUDA with SELinux (#5259)
* Add note to help run nvidia containers with SELinux

* Use correct CUDA container references as noted in the dockerhub overview

* Clean trailing whitespaces
2025-04-28 09:00:52 +02:00
LocalAI [bot]
23f347e687
chore: ⬆️ Update ggml-org/llama.cpp to ced44be34290fab450f8344efa047d8a08e723b4 (#5258)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-27 21:59:35 +00:00
Mohit Gaur
b6e3dc5f02
docs: update docs for DisableWebUI flag (#5256)
Signed-off-by: Mohit Gaur <56885276+Mohit-Gaur@users.noreply.github.com>
2025-04-27 16:02:02 +02:00
Alessandro Pirastru
69667521e2
fix(install/gpu):Fix docker not being able to leverage the GPU on systems that have SELinux Enforced (#5252)
* Update installation script for improved compatibility and clarity

- Renamed VERSION to LOCALAI_VERSION to avoid conflicts with system variables.
- Enhanced NVIDIA and CUDA repository installation for DNF5 compatibility.
- Adjusted default Fedora version handling for CUDA installation.
- Updated Docker image tag handling to use LOCALAI_VERSION consistently.
- Improved logging messages for repository and LocalAI binary downloads.
- Added a temporary bypass for nvidia-smi installation on Fedora Cloud Edition.

* feat: Add SELinux configuration for NVIDIA GPU support in containers

- Introduced `enable_selinux_container_booleans` function to handle SELinux configuration changes for GPU access.
- Included user confirmation prompt to enable SELinux `container_use_devices` boolean due to security implications.
- Added NVIDIA Container Runtime to Docker runtimes and restarted Docker to ensure proper GPU support.
- Applied SELinux adjustments conditionally for Fedora, RHEL, CentOS, Rocky, and openSUSE distributions.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* fix: Correct SELinux boolean parsing and add loop break

- Fixed incorrect parsing of `container_use_devices` boolean by changing the awk field from `$2` to `$3` to retrieve the correct value.
- Added a `break` statement after enabling the SELinux boolean to prevent unnecessary loop iterations after user prompt.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* fix: typo in install.sh

Signed-off-by: Alessandro Pirastru <57262788+Bloodis94@users.noreply.github.com>

---------

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>
Signed-off-by: Alessandro Pirastru <57262788+Bloodis94@users.noreply.github.com>
2025-04-27 16:01:29 +02:00
LocalAI [bot]
2a92effc5d
chore: ⬆️ Update ggml-org/llama.cpp to 77d5e9a76a7b4a8a7c5bf9cf6ebef91860123cba (#5254)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-27 09:21:02 +02:00
Simon Redman
a65e012aa2
docs(Vulkan): Add GPU docker documentation for Vulkan (#5255)
Add GPU docker documentation for Vulkan
2025-04-27 09:20:26 +02:00
Ettore Di Giacinto
8e9b41d05f
chore(ci): build only images with ffmpeg included, simplify tags (#5251)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-27 08:23:25 +02:00
LocalAI [bot]
078da5c2f0
feat(swagger): update swagger (#5253)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-26 22:40:35 +00:00
Ettore Di Giacinto
c5af5d139c
Update index.yaml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-26 18:42:22 +02:00
Ettore Di Giacinto
2c9279a542
feat(video-gen): add endpoint for video generation (#5247)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-26 18:05:01 +02:00
Ettore Di Giacinto
a67d22f5f2 chore(model gallery): add mmproj to gemma3 models (now working)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-26 17:31:40 +02:00
Ettore Di Giacinto
dc7c51dcc7 chore(model gallery): fix correct filename for gemma-3-27b-it-qat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-26 17:27:50 +02:00
Ettore Di Giacinto
98df65c7aa
chore(model gallery): add l3.3-genetic-lemonade-sunset-70b (#5250)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-26 17:19:20 +02:00
Ettore Di Giacinto
1559b6b522
chore(model gallery): add l3.3-geneticlemonade-unleashed-v2-70b (#5249)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-26 17:17:18 +02:00
Alessandro Pirastru
a0244e3fb4
feat(install): added complete process for installing nvidia drivers on fedora without pulling X11 (#5246)
* Update installation script for improved compatibility and clarity

- Renamed VERSION to LOCALAI_VERSION to avoid conflicts with system variables.
- Enhanced NVIDIA and CUDA repository installation for DNF5 compatibility.
- Adjusted default Fedora version handling for CUDA installation.
- Updated Docker image tag handling to use LOCALAI_VERSION consistently.
- Improved logging messages for repository and LocalAI binary downloads.
- Added a temporary bypass for nvidia-smi installation on Fedora Cloud Edition.

* Enhance log functions with ANSI color formatting

- Added ANSI escape codes for improved log styling: light blue for info, orange for warnings, and red for errors.
- Updated all log functions (`info`, `warn`, `fatal`) to include bold and colored output.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* feat: Enhance log functions with ANSI color formatting

- Added ANSI escape codes for improved log styling: light blue for info, orange for warnings, and red for errors.
- Updated all log functions (`info`, `warn`, `fatal`) to include bold and colored output.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* chore: ⬆️ Update ggml-org/llama.cpp to `ecda2ec4b347031a9b8a89ee2efc664ce63f599c` (#5238)

⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* fix(stablediffusion-ggml): Build with DSD CUDA, HIP and Metal flags (#5236)

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(install): enhance script with choice functions and logs

- Added custom `choice_info`, `choice_warn`, and `choice_fatal` functions for interactive input logging.
- Adjusted Docker volume creation message for better clarity.
- Included NVIDIA driver check log for improved feedback to users.
- Added consistent logging before starting LocalAI Docker containers across configurations.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* feat(install): add Fedora NVIDIA driver installation option

- Introduced a new function to install NVIDIA kernel drivers on Fedora using akmod packages.
- Added user prompt to choose between installing drivers automatically or exiting for manual setup.
- Integrated the new function into the existing Fedora-specific CUDA toolkit installation workflow.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* fix(install): correct repository ID for DNF5 configuration

- Update repository ID from 'nome-repo' to 'nvidia-cuda' for DNF5.
- Ensures the correct repository name matches expected configuration.
- Fix prevents potential misconfiguration during installation process.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* feat(install): enhance NVIDIA driver handling on Fedora

- fixed `install_cuda_driver_yum` function call in `install_fedora_nvidia_kernel_drivers`
- Added `cuda-toolkit` for Fedora installations, as recommended by RPM Fusion.
- Adjusted driver repository commands for compatibility with DNF5.
- Improved URL and version handling for package manager installations.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* Refactor NVIDIA driver installation process in install.sh

- Removed redundant empty lines for cleaner formatting.
- Standardized URL formatting by removing unnecessary quotes around URLs.
- Reverted logic by removing Fedora-specific exclusions for cuda-toolkit and using `cuda-drivers` universally.
- Refined repository addition for `dnf` by explicitly setting `id` and `name` parameters for clarity and accuracy.
- Fixed minor formatting inconsistencies in parameter passing.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* feat: Update NVIDIA module installation warning in install script

- Clarified that Akmod installation may inhibit the reboot command.
- Added a cautionary note to the warning to inform users of potential risks.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

* Update NVIDIA driver installation warning message

- Clarify prerequisites by noting the need for rpmfusion free/nonfree repos.
- Improve formatting of the warning box for better readability.
- Inform users that the script will install missing repos if necessary.

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>

---------

Signed-off-by: Alessandro Pirastru <alessandro.pirastru.94@gmail.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: LocalAI [bot] <139863280+localai-bot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Richard Palethorpe <io@richiejp.com>
2025-04-26 09:44:40 +02:00
LocalAI [bot]
d66396201a
chore: ⬆️ Update ggml-org/llama.cpp to 295354ea6848a77bdee204ee1c971d9b92ffcca9 (#5245)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-26 00:05:16 +02:00
Ettore Di Giacinto
9628860c0e
feat(llama.cpp/clip): inject gpu options if we detect GPUs (#5243)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-26 00:04:47 +02:00
Ettore Di Giacinto
cae9bf1308
chore(deps): bump grpcio to 1.72.0 (#5244)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-25 21:32:37 +02:00
Ettore Di Giacinto
5bb5da0760
fix(ci): add clang (#5242)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-25 16:20:05 +02:00
Ettore Di Giacinto
867973a850
chore(model gallery): add soob3123_veritas-12b (#5241)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-25 09:20:01 +02:00
LocalAI [bot]
701cd6b6d5
chore: ⬆️ Update ggml-org/llama.cpp to 226251ed56b85190e18a1cca963c45b888f4953c (#5240)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-25 08:42:22 +02:00
Richard Palethorpe
7f61d397d5
fix(stablediffusion-ggml): Build with DSD CUDA, HIP and Metal flags (#5236)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-24 10:27:17 +02:00
Alessandro Pirastru
1ae0b896fa
fix: installation script compatibility with fedora 41 and later, fedora headless unclear errors (#5239)
Update installation script for improved compatibility and clarity

- Renamed VERSION to LOCALAI_VERSION to avoid conflicts with system variables.
- Enhanced NVIDIA and CUDA repository installation for DNF5 compatibility.
- Adjusted default Fedora version handling for CUDA installation.
- Updated Docker image tag handling to use LOCALAI_VERSION consistently.
- Improved logging messages for repository and LocalAI binary downloads.
- Added a temporary bypass for nvidia-smi installation on Fedora Cloud Edition.
2025-04-24 09:34:25 +02:00
LocalAI [bot]
3937407cb3
chore: ⬆️ Update ggml-org/llama.cpp to ecda2ec4b347031a9b8a89ee2efc664ce63f599c (#5238)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-24 09:32:08 +02:00
LocalAI [bot]
0e34ae4f3f
chore: ⬆️ Update ggml-org/llama.cpp to 658987cfc9d752dca7758987390d5fb1a7a0a54a (#5234)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-23 09:13:49 +02:00
dependabot[bot]
a38b99ecb6
chore(deps): bump mxschmitt/action-tmate from 3.19 to 3.21 (#5231)
Bumps [mxschmitt/action-tmate](https://github.com/mxschmitt/action-tmate) from 3.19 to 3.21.
- [Release notes](https://github.com/mxschmitt/action-tmate/releases)
- [Changelog](https://github.com/mxschmitt/action-tmate/blob/master/RELEASE.md)
- [Commits](https://github.com/mxschmitt/action-tmate/compare/v3.19...v3.21)

---
updated-dependencies:
- dependency-name: mxschmitt/action-tmate
  dependency-version: '3.21'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-22 10:27:10 +02:00
LocalAI [bot]
a4a4358182
chore: ⬆️ Update ggml-org/llama.cpp to 1d735c0b4fa0551c51c2f4ac888dd9a01f447985 (#5233)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-22 10:25:54 +02:00
Ettore Di Giacinto
4bc39c2db3
fix: typo on README link
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-21 22:13:14 +02:00
Ettore Di Giacinto
cc3df759f8
chore(docs): improve installer.sh docs (#5232)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-21 22:11:43 +02:00
LocalAI [bot]
378161060c
chore: ⬆️ Update ggml-org/llama.cpp to 6602304814e679cc8c162bb760a034aceb4f8965 (#5228)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-20 21:44:33 +00:00
Ettore Di Giacinto
f2f788fe60
chore(model gallery): add starrysky-12b-i1 (#5224)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-20 10:26:30 +02:00
Ettore Di Giacinto
9fa8ed6b1e
chore(model gallery) add amoral-gemma3-1b-v2 (#5223)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-20 10:23:24 +02:00
Ettore Di Giacinto
7fc37c5e29
chore(model gallery) add llama_3.3_70b_darkhorse-i1 (#5222)
chore(model gallery): add llama_3.3_70b_darkhorse-i1

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-20 10:20:58 +02:00
Ettore Di Giacinto
4bc4b1e8bc chore(model gallery) update gemma3 qat models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-20 10:11:12 +02:00
LocalAI [bot]
e495b89f18
chore: ⬆️ Update ggml-org/llama.cpp to 00137157fca3d17b90380762b4d7cc158d385bd3 (#5218)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-19 23:50:35 +00:00
LocalAI [bot]
ba09eaea1b
feat(swagger): update swagger (#5217)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-19 22:06:30 +02:00
Ettore Di Giacinto
61cc76c455
chore(autogptq): drop archived backend (#5214)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-19 15:52:29 +02:00
Ettore Di Giacinto
8abecb4a18
chore: bump grpc limits to 50MB (#5212)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-19 08:53:24 +02:00
LocalAI [bot]
8b3f76d8e6
chore: ⬆️ Update ggml-org/llama.cpp to 6408210082cc0a61b992b487be7e2ff2efbb9e36 (#5211)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-18 21:45:48 +00:00
Ettore Di Giacinto
4e0497f1a6
chore(model gallery): add pictor-1338-qwenp-1.5b (#5208)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 10:47:23 +02:00
Ettore Di Giacinto
ba88c9f451
chore(ci): use gemma-3-12b-it for models notifications (twitter)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-18 10:38:36 +02:00
Ettore Di Giacinto
a598285825
chore(model gallery): add google-gemma-3-27b-it-qat-q4_0-small (#5207)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 10:35:48 +02:00
Ettore Di Giacinto
cb7a172897
chore(ci): use gemma-3-12b-it for models notifications
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-18 10:20:33 +02:00
Ettore Di Giacinto
771be28dfb
ci: use gemma3 for notifications of releases
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-18 10:19:52 +02:00
Ettore Di Giacinto
7d6b3eb42d
chore(model gallery): add readyart_amoral-fallen-omega-gemma3-12b (#5206)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 10:17:39 +02:00
Ettore Di Giacinto
0bb33fab55
chore(model gallery): add ibm-granite_granite-3.3-2b-instruct (#5205)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 10:15:05 +02:00
Ettore Di Giacinto
e3bf7f77f7
chore(model gallery): add ibm-granite_granite-3.3-8b-instruct (#5204)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-18 09:59:17 +02:00
LocalAI [bot]
bd1707d339
chore: ⬆️ Update ggml-org/llama.cpp to 2f74c354c0f752ed9aabf7d3a350e6edebd7e744 (#5203)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-17 21:52:12 +00:00
Ettore Di Giacinto
0474804541 fix(ci): remove duplicate entry
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 19:51:21 +02:00
Ettore Di Giacinto
72693b3917
feat(install.sh): allow to uninstall with --uninstall (#5202)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 16:32:23 +02:00
Florian Bachmann
a03b70010f
fix(talk): Talk interface sends content-type headers to chatgpt (#5200)
Talk interface sends content-type headers to chatgpt

Signed-off-by: baflo <834350+baflo@users.noreply.github.com>
2025-04-17 15:02:11 +02:00
Ettore Di Giacinto
e3717e5c1a
chore(model gallery): add qwen2.5-14b-instruct-1m (#5201)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 10:42:22 +02:00
Ettore Di Giacinto
c8f6858218
chore(ci): add latest images for core (#5198)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 10:00:18 +02:00
Ettore Di Giacinto
06d7cc43ae
chore(model gallery): add dreamgen_lucid-v1-nemo (#5196)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 09:10:09 +02:00
Ettore Di Giacinto
f2147cb850
chore(model gallery): add thedrummer_rivermind-12b-v1 (#5195)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 09:02:54 +02:00
Ettore Di Giacinto
75bb9f4c28
chore(model gallery): add menlo_rezero-v0.1-llama-3.2-3b-it-grpo-250404 (#5194)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-17 09:00:11 +02:00
LocalAI [bot]
a2ef4b1e07
chore: ⬆️ Update ggml-org/llama.cpp to 015022bb53387baa8b23817ac03743705c7d472b (#5192)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-17 08:04:37 +02:00
LocalAI [bot]
161c9fe2db
docs: ⬆️ update docs version mudler/LocalAI (#5191)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-16 22:13:49 +02:00
Ettore Di Giacinto
7547463f81
Update quickstart.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-04-16 08:48:55 +02:00
Gianluca Boiano
32e4dfd47b
chore(model gallery): add suno-ai bark-cpp model (#5187)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-04-16 08:22:46 +02:00
Gianluca Boiano
f67e5dec68
fix: bark-cpp: assign FLAG_TTS to bark-cpp backend (#5186)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2025-04-16 08:21:30 +02:00
LocalAI [bot]
297d54acea
chore: ⬆️ Update ggml-org/llama.cpp to 80f19b41869728eeb6a26569957b92a773a2b2c6 (#5183)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-04-15 22:50:32 +00:00
129 changed files with 4343 additions and 28558 deletions

2
.env
View file

@ -76,7 +76,7 @@
### Define a list of GRPC Servers for llama-cpp workers to distribute the load ### Define a list of GRPC Servers for llama-cpp workers to distribute the load
# https://github.com/ggerganov/llama.cpp/pull/6829 # https://github.com/ggerganov/llama.cpp/pull/6829
# https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md # https://github.com/ggerganov/llama.cpp/blob/master/tools/rpc/README.md
# LLAMACPP_GRPC_SERVERS="" # LLAMACPP_GRPC_SERVERS=""
### Enable to run parallel requests ### Enable to run parallel requests

View file

@ -29,10 +29,6 @@ updates:
schedule: schedule:
# Check for updates to GitHub Actions every weekday # Check for updates to GitHub Actions every weekday
interval: "weekly" interval: "weekly"
- package-ecosystem: "pip"
directory: "/backend/python/autogptq"
schedule:
interval: "weekly"
- package-ecosystem: "pip" - package-ecosystem: "pip"
directory: "/backend/python/bark" directory: "/backend/python/bark"
schedule: schedule:

View file

@ -12,7 +12,7 @@ jobs:
- repository: "ggml-org/llama.cpp" - repository: "ggml-org/llama.cpp"
variable: "CPPLLAMA_VERSION" variable: "CPPLLAMA_VERSION"
branch: "master" branch: "master"
- repository: "ggerganov/whisper.cpp" - repository: "ggml-org/whisper.cpp"
variable: "WHISPER_CPP_VERSION" variable: "WHISPER_CPP_VERSION"
branch: "master" branch: "master"
- repository: "PABannier/bark.cpp" - repository: "PABannier/bark.cpp"

View file

@ -14,7 +14,7 @@ jobs:
steps: steps:
- name: Dependabot metadata - name: Dependabot metadata
id: metadata id: metadata
uses: dependabot/fetch-metadata@v2.3.0 uses: dependabot/fetch-metadata@v2.4.0
with: with:
github-token: "${{ secrets.GITHUB_TOKEN }}" github-token: "${{ secrets.GITHUB_TOKEN }}"
skip-commit-verification: true skip-commit-verification: true

View file

@ -42,7 +42,7 @@ jobs:
script: | script: |
sudo rm -rf local-ai/ || true sudo rm -rf local-ai/ || true
- name: copy file via ssh - name: copy file via ssh
uses: appleboy/scp-action@v0.1.7 uses: appleboy/scp-action@v1.0.0
with: with:
host: ${{ secrets.EXPLORER_SSH_HOST }} host: ${{ secrets.EXPLORER_SSH_HOST }}
username: ${{ secrets.EXPLORER_SSH_USERNAME }} username: ${{ secrets.EXPLORER_SSH_USERNAME }}

View file

@ -33,6 +33,7 @@ jobs:
# Pushing with all jobs in parallel # Pushing with all jobs in parallel
# eats the bandwidth of all the nodes # eats the bandwidth of all the nodes
max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }} max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }}
fail-fast: false
matrix: matrix:
include: include:
# This is basically covered by the AIO test # This is basically covered by the AIO test
@ -56,26 +57,35 @@ jobs:
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04" base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
# - build-type: 'hipblas' - build-type: 'hipblas'
# platforms: 'linux/amd64' platforms: 'linux/amd64'
# tag-latest: 'false' tag-latest: 'false'
# tag-suffix: '-hipblas' tag-suffix: '-hipblas'
# ffmpeg: 'false' ffmpeg: 'false'
# image-type: 'extras' image-type: 'extras'
# base-image: "rocm/dev-ubuntu-22.04:6.1" base-image: "rocm/dev-ubuntu-22.04:6.1"
# grpc-base-image: "ubuntu:22.04" grpc-base-image: "ubuntu:22.04"
# runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
# makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
# - build-type: 'sycl_f16' - build-type: 'sycl_f16'
# platforms: 'linux/amd64' platforms: 'linux/amd64'
# tag-latest: 'false' tag-latest: 'false'
# base-image: "quay.io/go-skynet/intel-oneapi-base:latest" base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
# grpc-base-image: "ubuntu:22.04" grpc-base-image: "ubuntu:22.04"
# tag-suffix: 'sycl-f16-ffmpeg' tag-suffix: 'sycl-f16-ffmpeg'
# ffmpeg: 'true' ffmpeg: 'true'
# image-type: 'extras' image-type: 'extras'
# runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
# makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
- build-type: 'vulkan'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-vulkan-ffmpeg-core'
ffmpeg: 'true'
image-type: 'core'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=4 --output-sync=target"
# core-image-build: # core-image-build:
# uses: ./.github/workflows/image_build.yml # uses: ./.github/workflows/image_build.yml
# with: # with:

View file

@ -45,13 +45,13 @@ jobs:
- build-type: 'hipblas' - build-type: 'hipblas'
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'auto' tag-latest: 'auto'
tag-suffix: '-hipblas-ffmpeg' tag-suffix: '-hipblas-extras'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'extras' image-type: 'extras'
aio: "-aio-gpu-hipblas" aio: "-aio-gpu-hipblas"
base-image: "rocm/dev-ubuntu-22.04:6.1" base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04" grpc-base-image: "ubuntu:22.04"
latest-image: 'latest-gpu-hipblas' latest-image: 'latest-gpu-hipblas-extras'
latest-image-aio: 'latest-aio-gpu-hipblas' latest-image-aio: 'latest-aio-gpu-hipblas'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
@ -59,32 +59,13 @@ jobs:
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'false' tag-latest: 'false'
tag-suffix: '-hipblas' tag-suffix: '-hipblas'
ffmpeg: 'false'
image-type: 'extras'
base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-hipblas-ffmpeg-core'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'core' image-type: 'core'
base-image: "rocm/dev-ubuntu-22.04:6.1" base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04" grpc-base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
- build-type: 'hipblas' latest-image: 'latest-gpu-hipblas'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-hipblas-core'
ffmpeg: 'false'
image-type: 'core'
base-image: "rocm/dev-ubuntu-22.04:6.1"
grpc-base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
self-hosted-jobs: self-hosted-jobs:
uses: ./.github/workflows/image_build.yml uses: ./.github/workflows/image_build.yml
with: with:
@ -114,110 +95,58 @@ jobs:
max-parallel: ${{ github.event_name != 'pull_request' && 5 || 8 }} max-parallel: ${{ github.event_name != 'pull_request' && 5 || 8 }}
matrix: matrix:
include: include:
# Extra images
- build-type: ''
#platforms: 'linux/amd64,linux/arm64'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: ''
ffmpeg: ''
image-type: 'extras'
runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
- build-type: ''
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-ffmpeg'
ffmpeg: 'true'
image-type: 'extras'
runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'cublas' - build-type: 'cublas'
cuda-major-version: "11" cuda-major-version: "11"
cuda-minor-version: "7" cuda-minor-version: "7"
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'false' tag-latest: 'false'
tag-suffix: '-cublas-cuda11' tag-suffix: '-cublas-cuda11-extras'
ffmpeg: ''
image-type: 'extras'
runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-cublas-cuda12'
ffmpeg: ''
image-type: 'extras'
runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'cublas'
cuda-major-version: "11"
cuda-minor-version: "7"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cublas-cuda11-ffmpeg'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'extras' image-type: 'extras'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04" base-image: "ubuntu:22.04"
aio: "-aio-gpu-nvidia-cuda-11" aio: "-aio-gpu-nvidia-cuda-11"
latest-image: 'latest-gpu-nvidia-cuda-11' latest-image: 'latest-gpu-nvidia-cuda-11-extras'
latest-image-aio: 'latest-aio-gpu-nvidia-cuda-11' latest-image-aio: 'latest-aio-gpu-nvidia-cuda-11'
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
- build-type: 'cublas' - build-type: 'cublas'
cuda-major-version: "12" cuda-major-version: "12"
cuda-minor-version: "0" cuda-minor-version: "0"
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'auto' tag-latest: 'false'
tag-suffix: '-cublas-cuda12-ffmpeg' tag-suffix: '-cublas-cuda12-extras'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'extras' image-type: 'extras'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04" base-image: "ubuntu:22.04"
aio: "-aio-gpu-nvidia-cuda-12" aio: "-aio-gpu-nvidia-cuda-12"
latest-image: 'latest-gpu-nvidia-cuda-12' latest-image: 'latest-gpu-nvidia-cuda-12-extras'
latest-image-aio: 'latest-aio-gpu-nvidia-cuda-12' latest-image-aio: 'latest-aio-gpu-nvidia-cuda-12'
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
- build-type: ''
#platforms: 'linux/amd64,linux/arm64'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: ''
ffmpeg: ''
image-type: 'extras'
base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'sycl_f16' - build-type: 'sycl_f16'
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'auto' tag-latest: 'false'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest" base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04" grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f16-ffmpeg' tag-suffix: '-sycl-f16-extras'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'extras' image-type: 'extras'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
aio: "-aio-gpu-intel-f16" aio: "-aio-gpu-intel-f16"
latest-image: 'latest-gpu-intel-f16' latest-image: 'latest-gpu-intel-f16-extras'
latest-image-aio: 'latest-aio-gpu-intel-f16' latest-image-aio: 'latest-aio-gpu-intel-f16'
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
- build-type: 'sycl_f32' - build-type: 'sycl_f32'
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'auto' tag-latest: 'false'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest" base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04" grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f32-ffmpeg' tag-suffix: '-sycl-f32-extras'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'extras' image-type: 'extras'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
aio: "-aio-gpu-intel-f32" aio: "-aio-gpu-intel-f32"
latest-image: 'latest-gpu-intel-f32' latest-image: 'latest-gpu-intel-f32-extras'
latest-image-aio: 'latest-aio-gpu-intel-f32' latest-image-aio: 'latest-aio-gpu-intel-f32'
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
# Core images # Core images
@ -226,41 +155,23 @@ jobs:
tag-latest: 'false' tag-latest: 'false'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest" base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04" grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f16-core' tag-suffix: '-sycl-f16'
ffmpeg: 'false' ffmpeg: 'true'
image-type: 'core' image-type: 'core'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
latest-image: 'latest-gpu-intel-f16'
- build-type: 'sycl_f32' - build-type: 'sycl_f32'
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'false' tag-latest: 'false'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest" base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04" grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f32-core' tag-suffix: '-sycl-f32'
ffmpeg: 'false'
image-type: 'core'
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'sycl_f16'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f16-ffmpeg-core'
ffmpeg: 'true'
image-type: 'core'
runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'sycl_f32'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-sycl-f32-ffmpeg-core'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'core' image-type: 'core'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
makeflags: "--jobs=3 --output-sync=target" makeflags: "--jobs=3 --output-sync=target"
latest-image: 'latest-gpu-intel-f32'
core-image-build: core-image-build:
uses: ./.github/workflows/image_build.yml uses: ./.github/workflows/image_build.yml
@ -293,7 +204,7 @@ jobs:
- build-type: '' - build-type: ''
platforms: 'linux/amd64,linux/arm64' platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto' tag-latest: 'auto'
tag-suffix: '-ffmpeg-core' tag-suffix: ''
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'core' image-type: 'core'
base-image: "ubuntu:22.04" base-image: "ubuntu:22.04"
@ -308,60 +219,38 @@ jobs:
cuda-minor-version: "7" cuda-minor-version: "7"
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'false' tag-latest: 'false'
tag-suffix: '-cublas-cuda11-core' tag-suffix: '-cublas-cuda11'
ffmpeg: ''
image-type: 'core'
base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-cublas-cuda12-core'
ffmpeg: ''
image-type: 'core'
base-image: "ubuntu:22.04"
runs-on: 'arc-runner-set'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
- build-type: 'cublas'
cuda-major-version: "11"
cuda-minor-version: "7"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-cublas-cuda11-ffmpeg-core'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'core' image-type: 'core'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04" base-image: "ubuntu:22.04"
makeflags: "--jobs=4 --output-sync=target" makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false' skip-drivers: 'false'
latest-image: 'latest-gpu-nvidia-cuda-12'
- build-type: 'cublas' - build-type: 'cublas'
cuda-major-version: "12" cuda-major-version: "12"
cuda-minor-version: "0" cuda-minor-version: "0"
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'false' tag-latest: 'false'
tag-suffix: '-cublas-cuda12-ffmpeg-core' tag-suffix: '-cublas-cuda12'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'core' image-type: 'core'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04" base-image: "ubuntu:22.04"
skip-drivers: 'false' skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target" makeflags: "--jobs=4 --output-sync=target"
latest-image: 'latest-gpu-nvidia-cuda-12'
- build-type: 'vulkan' - build-type: 'vulkan'
platforms: 'linux/amd64' platforms: 'linux/amd64'
tag-latest: 'false' tag-latest: 'false'
tag-suffix: '-vulkan-ffmpeg-core' tag-suffix: '-vulkan'
latest-image: 'latest-vulkan-ffmpeg-core'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'core' image-type: 'core'
runs-on: 'arc-runner-set' runs-on: 'arc-runner-set'
base-image: "ubuntu:22.04" base-image: "ubuntu:22.04"
skip-drivers: 'false' skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target" makeflags: "--jobs=4 --output-sync=target"
latest-image: 'latest-gpu-vulkan'
gh-runner: gh-runner:
uses: ./.github/workflows/image_build.yml uses: ./.github/workflows/image_build.yml
with: with:
@ -394,8 +283,8 @@ jobs:
cuda-minor-version: "0" cuda-minor-version: "0"
platforms: 'linux/arm64' platforms: 'linux/arm64'
tag-latest: 'false' tag-latest: 'false'
tag-suffix: '-nvidia-l4t-arm64-core' tag-suffix: '-nvidia-l4t-arm64'
latest-image: 'latest-nvidia-l4t-arm64-core' latest-image: 'latest-nvidia-l4t-arm64'
ffmpeg: 'true' ffmpeg: 'true'
image-type: 'core' image-type: 'core'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0" base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"

View file

@ -8,7 +8,7 @@ jobs:
notify-discord: notify-discord:
if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }} if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
env: env:
MODEL_NAME: hermes-2-theta-llama-3-8b MODEL_NAME: gemma-3-12b-it
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
@ -16,7 +16,7 @@ jobs:
fetch-depth: 0 # needed to checkout all branches for this Action to work fetch-depth: 0 # needed to checkout all branches for this Action to work
- uses: mudler/localai-github-action@v1 - uses: mudler/localai-github-action@v1
with: with:
model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file" model: 'gemma-3-12b-it' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
# Check the PR diff using the current branch and the base branch of the PR # Check the PR diff using the current branch and the base branch of the PR
- uses: GrantBirki/git-diff-action@v2.8.0 - uses: GrantBirki/git-diff-action@v2.8.0
id: git-diff-action id: git-diff-action
@ -79,7 +79,7 @@ jobs:
args: ${{ steps.summarize.outputs.message }} args: ${{ steps.summarize.outputs.message }}
- name: Setup tmate session if fails - name: Setup tmate session if fails
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180
@ -87,7 +87,7 @@ jobs:
notify-twitter: notify-twitter:
if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }} if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
env: env:
MODEL_NAME: hermes-2-theta-llama-3-8b MODEL_NAME: gemma-3-12b-it
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
@ -161,7 +161,7 @@ jobs:
TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.TWITTER_ACCESS_TOKEN_SECRET }} TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.TWITTER_ACCESS_TOKEN_SECRET }}
- name: Setup tmate session if fails - name: Setup tmate session if fails
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180

View file

@ -14,7 +14,7 @@ jobs:
steps: steps:
- uses: mudler/localai-github-action@v1 - uses: mudler/localai-github-action@v1
with: with:
model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file" model: 'gemma-3-12b-it' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
- name: Summarize - name: Summarize
id: summarize id: summarize
run: | run: |

View file

@ -36,6 +36,7 @@ jobs:
sudo apt-get update sudo apt-get update
sudo apt-get install build-essential ffmpeg protobuf-compiler ccache upx-ucl gawk sudo apt-get install build-essential ffmpeg protobuf-compiler ccache upx-ucl gawk
sudo apt-get install -qy binutils-aarch64-linux-gnu gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libgmock-dev sudo apt-get install -qy binutils-aarch64-linux-gnu gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libgmock-dev
make install-go-tools
- name: Install CUDA Dependencies - name: Install CUDA Dependencies
run: | run: |
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/cross-linux-aarch64/cuda-keyring_1.1-1_all.deb curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/cross-linux-aarch64/cuda-keyring_1.1-1_all.deb
@ -123,7 +124,7 @@ jobs:
release/* release/*
- name: Setup tmate session if tests fail - name: Setup tmate session if tests fail
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180
@ -151,6 +152,7 @@ jobs:
run: | run: |
sudo apt-get update sudo apt-get update
sudo apt-get install -y wget curl build-essential ffmpeg protobuf-compiler ccache upx-ucl gawk cmake libgmock-dev sudo apt-get install -y wget curl build-essential ffmpeg protobuf-compiler ccache upx-ucl gawk cmake libgmock-dev
make install-go-tools
- name: Intel Dependencies - name: Intel Dependencies
run: | run: |
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
@ -232,7 +234,7 @@ jobs:
release/* release/*
- name: Setup tmate session if tests fail - name: Setup tmate session if tests fail
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180
@ -253,8 +255,7 @@ jobs:
- name: Dependencies - name: Dependencies
run: | run: |
brew install protobuf grpc brew install protobuf grpc
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@8ba23be9613c672d40ae261d2a1335d639bdd59b make install-go-tools
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.0
- name: Build - name: Build
id: build id: build
run: | run: |
@ -275,7 +276,7 @@ jobs:
release/* release/*
- name: Setup tmate session if tests fail - name: Setup tmate session if tests fail
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180
@ -295,8 +296,7 @@ jobs:
- name: Dependencies - name: Dependencies
run: | run: |
brew install protobuf grpc libomp llvm brew install protobuf grpc libomp llvm
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af make install-go-tools
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
- name: Build - name: Build
id: build id: build
run: | run: |
@ -317,7 +317,7 @@ jobs:
release/* release/*
- name: Setup tmate session if tests fail - name: Setup tmate session if tests fail
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180

View file

@ -18,7 +18,7 @@ jobs:
if: ${{ github.actor != 'dependabot[bot]' }} if: ${{ github.actor != 'dependabot[bot]' }}
- name: Run Gosec Security Scanner - name: Run Gosec Security Scanner
if: ${{ github.actor != 'dependabot[bot]' }} if: ${{ github.actor != 'dependabot[bot]' }}
uses: securego/gosec@v2.22.3 uses: securego/gosec@v2.22.4
with: with:
# we let the report trigger content trigger a failure using the GitHub Security features. # we let the report trigger content trigger a failure using the GitHub Security features.
args: '-no-fail -fmt sarif -out results.sarif ./...' args: '-no-fail -fmt sarif -out results.sarif ./...'

View file

@ -78,6 +78,26 @@ jobs:
make --jobs=5 --output-sync=target -C backend/python/diffusers make --jobs=5 --output-sync=target -C backend/python/diffusers
make --jobs=5 --output-sync=target -C backend/python/diffusers test make --jobs=5 --output-sync=target -C backend/python/diffusers test
#tests-vllm:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v4
# with:
# submodules: true
# - name: Dependencies
# run: |
# sudo apt-get update
# sudo apt-get install -y build-essential ffmpeg
# sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# sudo apt-get install -y libopencv-dev
# # Install UV
# curl -LsSf https://astral.sh/uv/install.sh | sh
# pip install --user --no-cache-dir grpcio-tools==1.64.1
# - name: Test vllm backend
# run: |
# make --jobs=5 --output-sync=target -C backend/python/vllm
# make --jobs=5 --output-sync=target -C backend/python/vllm test
# tests-transformers-musicgen: # tests-transformers-musicgen:
# runs-on: ubuntu-latest # runs-on: ubuntu-latest
# steps: # steps:

View file

@ -71,7 +71,7 @@ jobs:
run: | run: |
sudo apt-get update sudo apt-get update
sudo apt-get install build-essential ccache upx-ucl curl ffmpeg sudo apt-get install build-essential ccache upx-ucl curl ffmpeg
sudo apt-get install -y libgmock-dev sudo apt-get install -y libgmock-dev clang
curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \ curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \ sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \ gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
@ -96,6 +96,7 @@ jobs:
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
go install github.com/GeertJohan/go.rice/rice@latest
# The python3-grpc-tools package in 22.04 is too old # The python3-grpc-tools package in 22.04 is too old
pip install --user grpcio-tools pip install --user grpcio-tools
@ -130,7 +131,7 @@ jobs:
PATH="$PATH:/root/go/bin" GO_TAGS="tts" make --jobs 5 --output-sync=target test PATH="$PATH:/root/go/bin" GO_TAGS="tts" make --jobs 5 --output-sync=target test
- name: Setup tmate session if tests fail - name: Setup tmate session if tests fail
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180
@ -183,6 +184,7 @@ jobs:
rm protoc.zip rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
go install github.com/GeertJohan/go.rice/rice@latest
PATH="$PATH:$HOME/go/bin" make protogen-go PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Build images - name: Build images
run: | run: |
@ -194,7 +196,7 @@ jobs:
make run-e2e-aio make run-e2e-aio
- name: Setup tmate session if tests fail - name: Setup tmate session if tests fail
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180
@ -222,6 +224,7 @@ jobs:
run: | run: |
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
pip install --user --no-cache-dir grpcio-tools pip install --user --no-cache-dir grpcio-tools
go install github.com/GeertJohan/go.rice/rice@latest
- name: Test - name: Test
run: | run: |
export C_INCLUDE_PATH=/usr/local/include export C_INCLUDE_PATH=/usr/local/include
@ -232,7 +235,7 @@ jobs:
BUILD_TYPE="GITHUB_CI_HAS_BROKEN_METAL" CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make --jobs 4 --output-sync=target test BUILD_TYPE="GITHUB_CI_HAS_BROKEN_METAL" CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make --jobs 4 --output-sync=target test
- name: Setup tmate session if tests fail - name: Setup tmate session if tests fail
if: ${{ failure() }} if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19 uses: mxschmitt/action-tmate@v3.22
with: with:
detached: true detached: true
connect-timeout-seconds: 180 connect-timeout-seconds: 180

View file

@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive ENV DEBIAN_FRONTEND=noninteractive
ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh" ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh"
RUN apt-get update && \ RUN apt-get update && \
apt-get install -y --no-install-recommends \ apt-get install -y --no-install-recommends \
@ -46,9 +46,10 @@ EOT
RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz
ENV PATH=$PATH:/root/go/bin:/usr/local/go/bin ENV PATH=$PATH:/root/go/bin:/usr/local/go/bin
# Install grpc compilers # Install grpc compilers and rice
RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \ RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af && \
go install github.com/GeertJohan/go.rice/rice@latest
COPY --chmod=644 custom-ca-certs/* /usr/local/share/ca-certificates/ COPY --chmod=644 custom-ca-certs/* /usr/local/share/ca-certificates/
RUN update-ca-certificates RUN update-ca-certificates
@ -300,10 +301,9 @@ COPY .git .
RUN make prepare RUN make prepare
## Build the binary ## Build the binary
## If it's CUDA or hipblas, we want to skip some of the llama-compat backends to save space ## If we're on arm64 AND using cublas/hipblas, skip some of the llama-compat backends to save space
## We only leave the most CPU-optimized variant and the fallback for the cublas/hipblas build ## Otherwise just run the normal build
## (both will use CUDA or hipblas for the actual computation) RUN if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then \
RUN if [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then \
SKIP_GRPC_BACKEND="backend-assets/grpc/llama-cpp-avx512 backend-assets/grpc/llama-cpp-avx backend-assets/grpc/llama-cpp-avx2" make build; \ SKIP_GRPC_BACKEND="backend-assets/grpc/llama-cpp-avx512 backend-assets/grpc/llama-cpp-avx backend-assets/grpc/llama-cpp-avx2" make build; \
else \ else \
make build; \ make build; \
@ -431,9 +431,6 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMA
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vllm" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vllm" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/vllm \ make -C backend/python/vllm \
; fi && \ ; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "autogptq" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/autogptq \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "bark" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \ if [[ ( "${EXTRA_BACKENDS}" =~ "bark" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/bark \ make -C backend/python/bark \
; fi && \ ; fi && \

View file

@ -6,11 +6,11 @@ BINARY_NAME=local-ai
DETECT_LIBS?=true DETECT_LIBS?=true
# llama.cpp versions # llama.cpp versions
CPPLLAMA_VERSION?=d6d2c2ab8c8865784ba9fef37f2b2de3f2134d33 CPPLLAMA_VERSION?=6a2bc8bfb7cd502e5ebc72e36c97a6f848c21c2c
# whisper.cpp version # whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=6266a9f9e56a5b925e9892acf650f3eb1245814d WHISPER_CPP_VERSION?=d1f114da61b1ae1e70b03104fad42c9dd666feeb
# go-piper version # go-piper version
PIPER_REPO?=https://github.com/mudler/go-piper PIPER_REPO?=https://github.com/mudler/go-piper
@ -24,14 +24,21 @@ BARKCPP_VERSION?=v1.0.0
STABLEDIFFUSION_GGML_REPO?=https://github.com/richiejp/stable-diffusion.cpp STABLEDIFFUSION_GGML_REPO?=https://github.com/richiejp/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=53e3b17eb3d0b5760ced06a1f98320b68b34aaae STABLEDIFFUSION_GGML_VERSION?=53e3b17eb3d0b5760ced06a1f98320b68b34aaae
# ONEAPI variables for SYCL
export ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
ONNX_VERSION?=1.20.0 ONNX_VERSION?=1.20.0
ONNX_ARCH?=x64 ONNX_ARCH?=x64
ONNX_OS?=linux ONNX_OS?=linux
export BUILD_TYPE?= export BUILD_TYPE?=
export STABLE_BUILD_TYPE?=$(BUILD_TYPE) export STABLE_BUILD_TYPE?=$(BUILD_TYPE)
export CMAKE_ARGS?= export CMAKE_ARGS?=-DBUILD_SHARED_LIBS=OFF
export WHISPER_CMAKE_ARGS?=-DBUILD_SHARED_LIBS=OFF
export BACKEND_LIBS?= export BACKEND_LIBS?=
export WHISPER_DIR=$(abspath ./sources/whisper.cpp)
export WHISPER_INCLUDE_PATH=$(WHISPER_DIR)/include:$(WHISPER_DIR)/ggml/include
export WHISPER_LIBRARY_PATH=$(WHISPER_DIR)/build/src/:$(WHISPER_DIR)/build/ggml/src
CGO_LDFLAGS?= CGO_LDFLAGS?=
CGO_LDFLAGS_WHISPER?= CGO_LDFLAGS_WHISPER?=
@ -81,6 +88,7 @@ endif
# IF native is false, we add -DGGML_NATIVE=OFF to CMAKE_ARGS # IF native is false, we add -DGGML_NATIVE=OFF to CMAKE_ARGS
ifeq ($(NATIVE),false) ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF CMAKE_ARGS+=-DGGML_NATIVE=OFF
WHISPER_CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif endif
# Detect if we are running on arm64 # Detect if we are running on arm64
@ -108,13 +116,31 @@ ifeq ($(OS),Darwin)
# disable metal if on Darwin and any other value is explicitly passed. # disable metal if on Darwin and any other value is explicitly passed.
else ifneq ($(BUILD_TYPE),metal) else ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF CMAKE_ARGS+=-DGGML_METAL=OFF
WHISPER_CMAKE_ARGS+=-DGGML_METAL=OFF
export GGML_NO_ACCELERATE=1 export GGML_NO_ACCELERATE=1
export GGML_NO_METAL=1 export GGML_NO_METAL=1
GO_LDFLAGS_WHISPER+=-lggml-blas
export WHISPER_LIBRARY_PATH:=$(WHISPER_LIBRARY_PATH):$(WHISPER_DIR)/build/ggml/src/ggml-blas
endif endif
ifeq ($(BUILD_TYPE),metal) ifeq ($(BUILD_TYPE),metal)
# -lcblas removed: it seems to always be listed as a duplicate flag.
CGO_LDFLAGS += -framework Accelerate CGO_LDFLAGS += -framework Accelerate
CGO_LDFLAGS_WHISPER+=-lggml-metal -lggml-blas
CMAKE_ARGS+=-DGGML_METAL=ON
CMAKE_ARGS+=-DGGML_METAL_USE_BF16=ON
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
CMAKE_ARGS+=-DGGML_OPENMP=OFF
WHISPER_CMAKE_ARGS+=-DGGML_METAL=ON
WHISPER_CMAKE_ARGS+=-DGGML_METAL_USE_BF16=ON
WHISPER_CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
WHISPER_CMAKE_ARGS+=-DWHISPER_BUILD_EXAMPLES=OFF
WHISPER_CMAKE_ARGS+=-DWHISPER_BUILD_TESTS=OFF
WHISPER_CMAKE_ARGS+=-DWHISPER_BUILD_SERVER=OFF
WHISPER_CMAKE_ARGS+=-DGGML_OPENMP=OFF
export WHISPER_LIBRARY_PATH:=$(WHISPER_LIBRARY_PATH):$(WHISPER_DIR)/build/ggml/src/ggml-metal/:$(WHISPER_DIR)/build/ggml/src/ggml-blas
else
CGO_LDFLAGS_WHISPER+=-lggml-blas
export WHISPER_LIBRARY_PATH:=$(WHISPER_LIBRARY_PATH):$(WHISPER_DIR)/build/ggml/src/ggml-blas
endif endif
else else
CGO_LDFLAGS_WHISPER+=-lgomp CGO_LDFLAGS_WHISPER+=-lgomp
@ -126,21 +152,29 @@ ifeq ($(BUILD_TYPE),openblas)
endif endif
ifeq ($(BUILD_TYPE),cublas) ifeq ($(BUILD_TYPE),cublas)
CGO_LDFLAGS+=-lcublas -lcudart -L$(CUDA_LIBPATH) CGO_LDFLAGS+=-lcublas -lcudart -L$(CUDA_LIBPATH) -L$(CUDA_LIBPATH)/stubs/ -lcuda
export GGML_CUDA=1 export GGML_CUDA=1
CGO_LDFLAGS_WHISPER+=-L$(CUDA_LIBPATH)/stubs/ -lcuda -lcufft CMAKE_ARGS+=-DGGML_CUDA=ON
WHISPER_CMAKE_ARGS+=-DGGML_CUDA=ON
CGO_LDFLAGS_WHISPER+=-lcufft -lggml-cuda
export WHISPER_LIBRARY_PATH:=$(WHISPER_LIBRARY_PATH):$(WHISPER_DIR)/build/ggml/src/ggml-cuda/
endif endif
ifeq ($(BUILD_TYPE),vulkan) ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=1 CMAKE_ARGS+=-DGGML_VULKAN=1
WHISPER_CMAKE_ARGS+=-DGGML_VULKAN=1
CGO_LDFLAGS_WHISPER+=-lggml-vulkan -lvulkan
export WHISPER_LIBRARY_PATH:=$(WHISPER_LIBRARY_PATH):$(WHISPER_DIR)/build/ggml/src/ggml-vulkan/
endif endif
ifneq (,$(findstring sycl,$(BUILD_TYPE))) ifneq (,$(findstring sycl,$(BUILD_TYPE)))
export GGML_SYCL=1 export GGML_SYCL=1
CMAKE_ARGS+=-DGGML_SYCL=ON
endif endif
ifeq ($(BUILD_TYPE),sycl_f16) ifeq ($(BUILD_TYPE),sycl_f16)
export GGML_SYCL_F16=1 export GGML_SYCL_F16=1
CMAKE_ARGS+=-DGGML_SYCL_F16=ON
endif endif
ifeq ($(BUILD_TYPE),hipblas) ifeq ($(BUILD_TYPE),hipblas)
@ -151,7 +185,7 @@ ifeq ($(BUILD_TYPE),hipblas)
export CC=$(ROCM_HOME)/llvm/bin/clang export CC=$(ROCM_HOME)/llvm/bin/clang
export STABLE_BUILD_TYPE= export STABLE_BUILD_TYPE=
export GGML_HIP=1 export GGML_HIP=1
GPU_TARGETS ?= gfx900,gfx906,gfx908,gfx940,gfx941,gfx942,gfx90a,gfx1030,gfx1031,gfx1100,gfx1101 GPU_TARGETS ?= gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102
AMDGPU_TARGETS ?= "$(GPU_TARGETS)" AMDGPU_TARGETS ?= "$(GPU_TARGETS)"
CMAKE_ARGS+=-DGGML_HIP=ON -DAMDGPU_TARGETS="$(AMDGPU_TARGETS)" -DGPU_TARGETS="$(GPU_TARGETS)" CMAKE_ARGS+=-DGGML_HIP=ON -DAMDGPU_TARGETS="$(AMDGPU_TARGETS)" -DGPU_TARGETS="$(GPU_TARGETS)"
CGO_LDFLAGS += -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link -L${ROCM_HOME}/lib/llvm/lib CGO_LDFLAGS += -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link -L${ROCM_HOME}/lib/llvm/lib
@ -286,8 +320,9 @@ sources/whisper.cpp:
git checkout $(WHISPER_CPP_VERSION) && \ git checkout $(WHISPER_CPP_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch git submodule update --init --recursive --depth 1 --single-branch
sources/whisper.cpp/libwhisper.a: sources/whisper.cpp sources/whisper.cpp/build/src/libwhisper.a: sources/whisper.cpp
cd sources/whisper.cpp && $(MAKE) libwhisper.a libggml.a cd sources/whisper.cpp && cmake $(WHISPER_CMAKE_ARGS) . -B ./build
cd sources/whisper.cpp/build && cmake --build . --config Release
get-sources: sources/go-piper sources/stablediffusion-ggml.cpp sources/bark.cpp sources/whisper.cpp backend/cpp/llama/llama.cpp get-sources: sources/go-piper sources/stablediffusion-ggml.cpp sources/bark.cpp sources/whisper.cpp backend/cpp/llama/llama.cpp
@ -337,8 +372,14 @@ clean-tests:
clean-dc: clean clean-dc: clean
cp -r /build/backend-assets /workspace/backend-assets cp -r /build/backend-assets /workspace/backend-assets
## Install Go tools
install-go-tools:
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install github.com/GeertJohan/go.rice/rice@latest
## Build: ## Build:
build: prepare backend-assets grpcs ## Build the project build: prepare backend-assets grpcs install-go-tools ## Build the project
$(info ${GREEN}I local-ai build info:${RESET}) $(info ${GREEN}I local-ai build info:${RESET})
$(info ${GREEN}I BUILD_TYPE: ${YELLOW}$(BUILD_TYPE)${RESET}) $(info ${GREEN}I BUILD_TYPE: ${YELLOW}$(BUILD_TYPE)${RESET})
$(info ${GREEN}I GO_TAGS: ${YELLOW}$(GO_TAGS)${RESET}) $(info ${GREEN}I GO_TAGS: ${YELLOW}$(GO_TAGS)${RESET})
@ -348,7 +389,9 @@ ifneq ($(BACKEND_LIBS),)
$(MAKE) backend-assets/lib $(MAKE) backend-assets/lib
cp -f $(BACKEND_LIBS) backend-assets/lib/ cp -f $(BACKEND_LIBS) backend-assets/lib/
endif endif
rm -rf $(BINARY_NAME) || true
CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o $(BINARY_NAME) ./ CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o $(BINARY_NAME) ./
rice append --exec $(BINARY_NAME)
build-minimal: build-minimal:
BUILD_GRPC_FOR_BACKEND_LLAMA=true GRPC_BACKENDS="backend-assets/grpc/llama-cpp-avx2" GO_TAGS=p2p $(MAKE) build BUILD_GRPC_FOR_BACKEND_LLAMA=true GRPC_BACKENDS="backend-assets/grpc/llama-cpp-avx2" GO_TAGS=p2p $(MAKE) build
@ -420,6 +463,7 @@ prepare-test: grpcs
cp -rf backend-assets core/http cp -rf backend-assets core/http
cp tests/models_fixtures/* test-models cp tests/models_fixtures/* test-models
## Test targets
test: prepare test-models/testmodel.ggml grpcs test: prepare test-models/testmodel.ggml grpcs
@echo 'Running tests' @echo 'Running tests'
export GO_TAGS="tts debug" export GO_TAGS="tts debug"
@ -494,7 +538,7 @@ protogen: protogen-go protogen-python
protogen-clean: protogen-go-clean protogen-python-clean protogen-clean: protogen-go-clean protogen-python-clean
.PHONY: protogen-go .PHONY: protogen-go
protogen-go: protogen-go: install-go-tools
mkdir -p pkg/grpc/proto mkdir -p pkg/grpc/proto
protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \ protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
backend/backend.proto backend/backend.proto
@ -505,18 +549,10 @@ protogen-go-clean:
$(RM) bin/* $(RM) bin/*
.PHONY: protogen-python .PHONY: protogen-python
protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen faster-whisper-protogen protogen-python: bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen faster-whisper-protogen
.PHONY: protogen-python-clean .PHONY: protogen-python-clean
protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean faster-whisper-protogen-clean protogen-python-clean: bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean faster-whisper-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
$(MAKE) -C backend/python/autogptq protogen
.PHONY: autogptq-protogen-clean
autogptq-protogen-clean:
$(MAKE) -C backend/python/autogptq protogen-clean
.PHONY: bark-protogen .PHONY: bark-protogen
bark-protogen: bark-protogen:
@ -593,7 +629,6 @@ vllm-protogen-clean:
## GRPC ## GRPC
# Note: it is duplicated in the Dockerfile # Note: it is duplicated in the Dockerfile
prepare-extra-conda-environments: protogen-python prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/autogptq
$(MAKE) -C backend/python/bark $(MAKE) -C backend/python/bark
$(MAKE) -C backend/python/coqui $(MAKE) -C backend/python/coqui
$(MAKE) -C backend/python/diffusers $(MAKE) -C backend/python/diffusers
@ -607,10 +642,12 @@ prepare-extra-conda-environments: protogen-python
prepare-test-extra: protogen-python prepare-test-extra: protogen-python
$(MAKE) -C backend/python/transformers $(MAKE) -C backend/python/transformers
$(MAKE) -C backend/python/diffusers $(MAKE) -C backend/python/diffusers
$(MAKE) -C backend/python/vllm
test-extra: prepare-test-extra test-extra: prepare-test-extra
$(MAKE) -C backend/python/transformers test $(MAKE) -C backend/python/transformers test
$(MAKE) -C backend/python/diffusers test $(MAKE) -C backend/python/diffusers test
$(MAKE) -C backend/python/vllm test
backend-assets: backend-assets:
mkdir -p backend-assets mkdir -p backend-assets
@ -752,8 +789,8 @@ ifneq ($(UPX),)
$(UPX) backend-assets/grpc/silero-vad $(UPX) backend-assets/grpc/silero-vad
endif endif
backend-assets/grpc/whisper: sources/whisper.cpp sources/whisper.cpp/libwhisper.a backend-assets/grpc backend-assets/grpc/whisper: sources/whisper.cpp sources/whisper.cpp/build/src/libwhisper.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS) $(CGO_LDFLAGS_WHISPER)" C_INCLUDE_PATH="$(CURDIR)/sources/whisper.cpp/include:$(CURDIR)/sources/whisper.cpp/ggml/include" LIBRARY_PATH=$(CURDIR)/sources/whisper.cpp \ CGO_LDFLAGS="$(CGO_LDFLAGS) $(CGO_LDFLAGS_WHISPER)" C_INCLUDE_PATH="${WHISPER_INCLUDE_PATH}" LIBRARY_PATH="${WHISPER_LIBRARY_PATH}" LD_LIBRARY_PATH="${WHISPER_LIBRARY_PATH}" \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/whisper ./backend/go/transcribe/whisper $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/whisper ./backend/go/transcribe/whisper
ifneq ($(UPX),) ifneq ($(UPX),)
$(UPX) backend-assets/grpc/whisper $(UPX) backend-assets/grpc/whisper

View file

@ -30,7 +30,7 @@
<p align="center"> <p align="center">
<a href="https://twitter.com/LocalAI_API" target="blank"> <a href="https://twitter.com/LocalAI_API" target="blank">
<img src="https://img.shields.io/twitter/follow/LocalAI_API?label=Follow: LocalAI_API&style=social" alt="Follow LocalAI_API"/> <img src="https://img.shields.io/badge/X-%23000000.svg?style=for-the-badge&logo=X&logoColor=white&label=LocalAI_API" alt="Follow LocalAI_API"/>
</a> </a>
<a href="https://discord.gg/uJAeKSAGDy" target="blank"> <a href="https://discord.gg/uJAeKSAGDy" target="blank">
<img src="https://dcbadge.vercel.app/api/server/uJAeKSAGDy?style=flat-square&theme=default-inverted" alt="Join LocalAI Discord Community"/> <img src="https://dcbadge.vercel.app/api/server/uJAeKSAGDy?style=flat-square&theme=default-inverted" alt="Join LocalAI Discord Community"/>
@ -43,7 +43,8 @@
> :bulb: Get help - [❓FAQ](https://localai.io/faq/) [💭Discussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/) > :bulb: Get help - [❓FAQ](https://localai.io/faq/) [💭Discussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/)
> >
> [💻 Quickstart](https://localai.io/basics/getting_started/) [🖼️ Models](https://models.localai.io/) [🚀 Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap) [🥽 Demo](https://demo.localai.io) [🌍 Explorer](https://explorer.localai.io) [🛫 Examples](https://github.com/mudler/LocalAI-examples) > [💻 Quickstart](https://localai.io/basics/getting_started/) [🖼️ Models](https://models.localai.io/) [🚀 Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap) [🥽 Demo](https://demo.localai.io) [🌍 Explorer](https://explorer.localai.io) [🛫 Examples](https://github.com/mudler/LocalAI-examples) Try on
[![Telegram](https://img.shields.io/badge/Telegram-2CA5E0?style=for-the-badge&logo=telegram&logoColor=white)](https://t.me/localaiofficial_bot)
[![tests](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[![Build and Release](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[![build container images](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[![Bump dependencies](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/localai)](https://artifacthub.io/packages/search?repo=localai) [![tests](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[![Build and Release](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[![build container images](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[![Bump dependencies](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/localai)](https://artifacthub.io/packages/search?repo=localai)
@ -103,28 +104,93 @@
Run the installer script: Run the installer script:
```bash ```bash
# Basic installation
curl https://localai.io/install.sh | sh curl https://localai.io/install.sh | sh
``` ```
For more installation options, see [Installer Options](https://localai.io/docs/advanced/installer/).
Or run with docker: Or run with docker:
### CPU only image: ### CPU only image:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-cpu
```
### Nvidia GPU:
```bash
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
```
### CPU and GPU image (bigger size):
```bash ```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
``` ```
### AIO images (it will pre-download a set of models ready for use, see https://localai.io/basics/container/)
### NVIDIA GPU Images:
```bash ```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu # CUDA 12.0 with core features
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CUDA 12.0 with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12-extras
# CUDA 11.7 with core features
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11
# CUDA 11.7 with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11-extras
# NVIDIA Jetson (L4T) ARM64
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
``` ```
### AMD GPU Images (ROCm):
```bash
# ROCm with core features
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
# ROCm with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas-extras
```
### Intel GPU Images (oneAPI):
```bash
# Intel GPU with FP16 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16
# Intel GPU with FP16 support and extra dependencies
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16-extras
# Intel GPU with FP32 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32
# Intel GPU with FP32 support and extra dependencies
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32-extras
```
### Vulkan GPU Images:
```bash
# Vulkan with core features
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
```
### AIO Images (pre-downloaded models):
```bash
# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11
# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel-f16
# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
```
For more information about the AIO images and pre-downloaded models, see [Container Documentation](https://localai.io/basics/container/).
To load models: To load models:
```bash ```bash

View file

@ -48,6 +48,6 @@ template:
<|im_start|>assistant <|im_start|>assistant
download_files: download_files:
- filename: localai-functioncall-phi-4-v0.3-q4_k_m.gguf - filename: localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
sha256: 23fee048ded2a6e2e1a7b6bbefa6cbf83068f194caa9552aecbaa00fec8a16d5 sha256: 4e7b7fe1d54b881f1ef90799219dc6cc285d29db24f559c8998d1addb35713d4
uri: huggingface://mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.3-q4_k_m.gguf uri: huggingface://mudler/LocalAI-functioncall-qwen2.5-7b-v0.5-Q4_K_M-GGUF/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf

View file

@ -1,6 +1,15 @@
package main package main
import "embed" import (
rice "github.com/GeertJohan/go.rice"
)
//go:embed backend-assets/* var backendAssets *rice.Box
var backendAssets embed.FS
func init() {
var err error
backendAssets, err = rice.FindBox("backend-assets")
if err != nil {
panic(err)
}
}

View file

@ -14,6 +14,7 @@ service Backend {
rpc PredictStream(PredictOptions) returns (stream Reply) {} rpc PredictStream(PredictOptions) returns (stream Reply) {}
rpc Embedding(PredictOptions) returns (EmbeddingResult) {} rpc Embedding(PredictOptions) returns (EmbeddingResult) {}
rpc GenerateImage(GenerateImageRequest) returns (Result) {} rpc GenerateImage(GenerateImageRequest) returns (Result) {}
rpc GenerateVideo(GenerateVideoRequest) returns (Result) {}
rpc AudioTranscription(TranscriptRequest) returns (TranscriptResult) {} rpc AudioTranscription(TranscriptRequest) returns (TranscriptResult) {}
rpc TTS(TTSRequest) returns (Result) {} rpc TTS(TTSRequest) returns (Result) {}
rpc SoundGeneration(SoundGenerationRequest) returns (Result) {} rpc SoundGeneration(SoundGenerationRequest) returns (Result) {}
@ -190,11 +191,7 @@ message ModelOptions {
int32 NGQA = 20; int32 NGQA = 20;
string ModelFile = 21; string ModelFile = 21;
// AutoGPTQ
string Device = 22;
bool UseTriton = 23;
string ModelBaseName = 24;
bool UseFastTokenizer = 25;
// Diffusers // Diffusers
string PipelineType = 26; string PipelineType = 26;
@ -305,6 +302,19 @@ message GenerateImageRequest {
int32 CLIPSkip = 11; int32 CLIPSkip = 11;
} }
message GenerateVideoRequest {
string prompt = 1;
string start_image = 2; // Path or base64 encoded image for the start frame
string end_image = 3; // Path or base64 encoded image for the end frame
int32 width = 4;
int32 height = 5;
int32 num_frames = 6; // Number of frames to generate
int32 fps = 7; // Frames per second
int32 seed = 8;
float cfg_scale = 9; // Classifier-free guidance scale
string dst = 10; // Output path for the generated video
}
message TTSRequest { message TTSRequest {
string text = 1; string text = 1;
string model = 2; string model = 2;

View file

@ -1,17 +1,17 @@
## XXX: In some versions of CMake clip wasn't being built before llama. ## XXX: In some versions of CMake clip wasn't being built before llama.
## This is an hack for now, but it should be fixed in the future. ## This is an hack for now, but it should be fixed in the future.
set(TARGET myclip) # set(TARGET myclip)
add_library(${TARGET} clip.cpp clip.h clip-impl.h llava.cpp llava.h) # add_library(${TARGET} clip.cpp clip.h clip-impl.h llava.cpp llava.h)
install(TARGETS ${TARGET} LIBRARY) # install(TARGETS ${TARGET} LIBRARY)
target_include_directories(myclip PUBLIC .) # target_include_directories(myclip PUBLIC .)
target_include_directories(myclip PUBLIC ../..) # target_include_directories(myclip PUBLIC ../..)
target_include_directories(myclip PUBLIC ../../common) # target_include_directories(myclip PUBLIC ../../common)
target_link_libraries(${TARGET} PRIVATE common ggml llama ${CMAKE_THREAD_LIBS_INIT}) # target_link_libraries(${TARGET} PRIVATE common ggml llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_11) # target_compile_features(${TARGET} PRIVATE cxx_std_11)
if (NOT MSVC) # if (NOT MSVC)
target_compile_options(${TARGET} PRIVATE -Wno-cast-qual) # stb_image.h # target_compile_options(${TARGET} PRIVATE -Wno-cast-qual) # stb_image.h
endif() # endif()
# END CLIP hack # END CLIP hack
@ -74,8 +74,12 @@ add_library(hw_grpc_proto
${hw_proto_srcs} ${hw_proto_srcs}
${hw_proto_hdrs} ) ${hw_proto_hdrs} )
add_executable(${TARGET} grpc-server.cpp utils.hpp json.hpp) add_executable(${TARGET} grpc-server.cpp utils.hpp json.hpp httplib.h)
target_link_libraries(${TARGET} PRIVATE common llama myclip ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
target_include_directories(${TARGET} PRIVATE ../llava)
target_include_directories(${TARGET} PRIVATE ${CMAKE_SOURCE_DIR})
target_link_libraries(${TARGET} PRIVATE common llama mtmd ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
absl::flags_parse absl::flags_parse
gRPC::${_REFLECTION} gRPC::${_REFLECTION}
gRPC::${_GRPC_GRPCPP} gRPC::${_GRPC_GRPCPP}

View file

@ -59,8 +59,8 @@ llama.cpp:
git checkout -b build $(LLAMA_VERSION) && \ git checkout -b build $(LLAMA_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch git submodule update --init --recursive --depth 1 --single-branch
llama.cpp/examples/grpc-server: llama.cpp llama.cpp/tools/grpc-server: llama.cpp
mkdir -p llama.cpp/examples/grpc-server mkdir -p llama.cpp/tools/grpc-server
bash prepare.sh bash prepare.sh
rebuild: rebuild:
@ -70,13 +70,13 @@ rebuild:
purge: purge:
rm -rf llama.cpp/build rm -rf llama.cpp/build
rm -rf llama.cpp/examples/grpc-server rm -rf llama.cpp/tools/grpc-server
rm -rf grpc-server rm -rf grpc-server
clean: purge clean: purge
rm -rf llama.cpp rm -rf llama.cpp
grpc-server: llama.cpp llama.cpp/examples/grpc-server grpc-server: llama.cpp llama.cpp/tools/grpc-server
@echo "Building grpc-server with $(BUILD_TYPE) build type and $(CMAKE_ARGS)" @echo "Building grpc-server with $(BUILD_TYPE) build type and $(CMAKE_ARGS)"
ifneq (,$(findstring sycl,$(BUILD_TYPE))) ifneq (,$(findstring sycl,$(BUILD_TYPE)))
+bash -c "source $(ONEAPI_VARS); \ +bash -c "source $(ONEAPI_VARS); \

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -1,7 +1,7 @@
diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp diff --git a/tools/mtmd/clip.cpp b/tools/mtmd/clip.cpp
index 3cd0d2fa..6c5e811a 100644 index 3cd0d2fa..6c5e811a 100644
--- a/examples/llava/clip.cpp --- a/tools/mtmd/clip.cpp
+++ b/examples/llava/clip.cpp +++ b/tools/mtmd/clip.cpp
@@ -2608,7 +2608,7 @@ bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_ima @@ -2608,7 +2608,7 @@ bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_ima
struct ggml_tensor * patches = ggml_graph_get_tensor(gf, "patches"); struct ggml_tensor * patches = ggml_graph_get_tensor(gf, "patches");
int* patches_data = (int*)malloc(ggml_nbytes(patches)); int* patches_data = (int*)malloc(ggml_nbytes(patches));

View file

@ -7,22 +7,46 @@ for patch in $(ls patches); do
patch -d llama.cpp/ -p1 < patches/$patch patch -d llama.cpp/ -p1 < patches/$patch
done done
cp -r CMakeLists.txt llama.cpp/examples/grpc-server/ set -e
cp -r grpc-server.cpp llama.cpp/examples/grpc-server/
cp -rfv json.hpp llama.cpp/examples/grpc-server/
cp -rfv utils.hpp llama.cpp/examples/grpc-server/
if grep -q "grpc-server" llama.cpp/examples/CMakeLists.txt; then cp -r CMakeLists.txt llama.cpp/tools/grpc-server/
cp -r grpc-server.cpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/common/json.hpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/tools/server/utils.hpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/tools/server/httplib.h llama.cpp/tools/grpc-server/
set +e
if grep -q "grpc-server" llama.cpp/tools/CMakeLists.txt; then
echo "grpc-server already added" echo "grpc-server already added"
else else
echo "add_subdirectory(grpc-server)" >> llama.cpp/examples/CMakeLists.txt echo "add_subdirectory(grpc-server)" >> llama.cpp/tools/CMakeLists.txt
fi fi
set -e
## XXX: In some versions of CMake clip wasn't being built before llama. # Now to keep maximum compatibility with the original server.cpp, we need to remove the index.html.gz.hpp and loading.html.hpp includes
## This is an hack for now, but it should be fixed in the future. # and remove the main function
cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h # TODO: upstream this to the original server.cpp by extracting the upstream main function to a separate file
cp -rfv llama.cpp/examples/llava/clip-impl.h llama.cpp/examples/grpc-server/clip-impl.h awk '
cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp /int[ \t]+main[ \t]*\(/ { # If the line starts the main function
echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h in_main=1; # Set a flag
cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h open_braces=0; # Track number of open braces
cp -rfv llama.cpp/examples/llava/clip.cpp llama.cpp/examples/grpc-server/clip.cpp }
in_main {
open_braces += gsub(/\{/, "{"); # Count opening braces
open_braces -= gsub(/\}/, "}"); # Count closing braces
if (open_braces == 0) { # If all braces are closed
in_main=0; # End skipping
}
next; # Skip lines inside main
}
!in_main # Print lines not inside main
' "llama.cpp/tools/server/server.cpp" > llama.cpp/tools/grpc-server/server.cpp
# remove index.html.gz.hpp and loading.html.hpp includes
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS
sed -i '' '/#include "index\.html\.gz\.hpp"/d; /#include "loading\.html\.hpp"/d' llama.cpp/tools/grpc-server/server.cpp
else
# Linux and others
sed -i '/#include "index\.html\.gz\.hpp"/d; /#include "loading\.html\.hpp"/d' llama.cpp/tools/grpc-server/server.cpp
fi

View file

@ -1,483 +0,0 @@
// https://github.com/ggerganov/llama.cpp/blob/master/examples/server/utils.hpp
#pragma once
#include <string>
#include <vector>
#include <set>
#include <mutex>
#include <condition_variable>
#include <unordered_map>
#include "json.hpp"
#include "../llava/clip.h"
using json = nlohmann::json;
extern bool server_verbose;
#ifndef SERVER_VERBOSE
#define SERVER_VERBOSE 1
#endif
#if SERVER_VERBOSE != 1
#define LOG_VERBOSE(MSG, ...)
#else
#define LOG_VERBOSE(MSG, ...) \
do \
{ \
if (server_verbose) \
{ \
server_log("VERBOSE", __func__, __LINE__, MSG, __VA_ARGS__); \
} \
} while (0)
#endif
#define LOG_ERROR( MSG, ...) server_log("ERROR", __func__, __LINE__, MSG, __VA_ARGS__)
#define LOG_WARNING(MSG, ...) server_log("WARNING", __func__, __LINE__, MSG, __VA_ARGS__)
#define LOG_INFO( MSG, ...) server_log("INFO", __func__, __LINE__, MSG, __VA_ARGS__)
//
// parallel
//
enum server_state {
SERVER_STATE_LOADING_MODEL, // Server is starting up, model not fully loaded yet
SERVER_STATE_READY, // Server is ready and model is loaded
SERVER_STATE_ERROR // An error occurred, load_model failed
};
enum task_type {
TASK_TYPE_COMPLETION,
TASK_TYPE_CANCEL,
TASK_TYPE_NEXT_RESPONSE
};
struct task_server {
int id = -1; // to be filled by llama_server_queue
int target_id;
task_type type;
json data;
bool infill_mode = false;
bool embedding_mode = false;
int multitask_id = -1;
};
struct task_result {
int id;
int multitask_id = -1;
bool stop;
bool error;
json result_json;
};
struct task_multi {
int id;
std::set<int> subtasks_remaining{};
std::vector<task_result> results{};
};
// TODO: can become bool if we can't find use of more states
enum slot_state
{
IDLE,
PROCESSING,
};
enum slot_command
{
NONE,
LOAD_PROMPT,
RELEASE,
};
struct slot_params
{
bool stream = true;
bool cache_prompt = false; // remember the prompt to avoid reprocessing all prompt
uint32_t seed = -1; // RNG seed
int32_t n_keep = 0; // number of tokens to keep from initial prompt
int32_t n_predict = -1; // new tokens to predict
std::vector<std::string> antiprompt;
json input_prefix;
json input_suffix;
};
struct slot_image
{
int32_t id;
bool request_encode_image = false;
float * image_embedding = nullptr;
int32_t image_tokens = 0;
clip_image_u8 * img_data;
std::string prefix_prompt; // before of this image
};
// completion token output with probabilities
struct completion_token_output
{
struct token_prob
{
llama_token tok;
float prob;
};
std::vector<token_prob> probs;
llama_token tok;
std::string text_to_send;
};
static inline void server_log(const char *level, const char *function, int line,
const char *message, const nlohmann::ordered_json &extra)
{
nlohmann::ordered_json log
{
{"timestamp", time(nullptr)},
{"level", level},
{"function", function},
{"line", line},
{"message", message},
};
if (!extra.empty())
{
log.merge_patch(extra);
}
const std::string str = log.dump(-1, ' ', false, json::error_handler_t::replace);
printf("%.*s\n", (int)str.size(), str.data());
fflush(stdout);
}
//
// server utils
//
template <typename T>
static T json_value(const json &body, const std::string &key, const T &default_value)
{
// Fallback null to default value
return body.contains(key) && !body.at(key).is_null()
? body.value(key, default_value)
: default_value;
}
inline std::string format_chatml(std::vector<json> messages)
{
std::ostringstream chatml_msgs;
for (auto it = messages.begin(); it != messages.end(); ++it) {
chatml_msgs << "<|im_start|>"
<< json_value(*it, "role", std::string("user")) << '\n';
chatml_msgs << json_value(*it, "content", std::string(""))
<< "<|im_end|>\n";
}
chatml_msgs << "<|im_start|>assistant" << '\n';
return chatml_msgs.str();
}
//
// work queue utils
//
struct llama_server_queue {
int id = 0;
std::mutex mutex_tasks;
// queues
std::vector<task_server> queue_tasks;
std::vector<task_server> queue_tasks_deferred;
std::vector<task_multi> queue_multitasks;
std::condition_variable condition_tasks;
// callback functions
std::function<void(task_server&)> callback_new_task;
std::function<void(task_multi&)> callback_finish_multitask;
std::function<void(void)> callback_all_task_finished;
// Add a new task to the end of the queue
int post(task_server task) {
std::unique_lock<std::mutex> lock(mutex_tasks);
if (task.id == -1) {
task.id = id++;
}
queue_tasks.push_back(std::move(task));
condition_tasks.notify_one();
return task.id;
}
// Add a new task, but defer until one slot is available
void defer(task_server task) {
std::unique_lock<std::mutex> lock(mutex_tasks);
queue_tasks_deferred.push_back(std::move(task));
}
// Get the next id for creating anew task
int get_new_id() {
std::unique_lock<std::mutex> lock(mutex_tasks);
return id++;
}
// Register function to process a new task
void on_new_task(std::function<void(task_server&)> callback) {
callback_new_task = callback;
}
// Register function to process a multitask
void on_finish_multitask(std::function<void(task_multi&)> callback) {
callback_finish_multitask = callback;
}
// Register the function to be called when the batch of tasks is finished
void on_all_tasks_finished(std::function<void(void)> callback) {
callback_all_task_finished = callback;
}
// Call when the state of one slot is changed
void notify_slot_changed() {
// move deferred tasks back to main loop
std::unique_lock<std::mutex> lock(mutex_tasks);
for (auto & task : queue_tasks_deferred) {
queue_tasks.push_back(std::move(task));
}
queue_tasks_deferred.clear();
}
// Start the main loop. This call is blocking
[[noreturn]]
void start_loop() {
while (true) {
// new task arrived
LOG_VERBOSE("have new task", {});
{
while (true)
{
std::unique_lock<std::mutex> lock(mutex_tasks);
if (queue_tasks.empty()) {
lock.unlock();
break;
}
task_server task = queue_tasks.front();
queue_tasks.erase(queue_tasks.begin());
lock.unlock();
LOG_VERBOSE("callback_new_task", {});
callback_new_task(task);
}
LOG_VERBOSE("callback_all_task_finished", {});
// process and update all the multitasks
auto queue_iterator = queue_multitasks.begin();
while (queue_iterator != queue_multitasks.end())
{
if (queue_iterator->subtasks_remaining.empty())
{
// all subtasks done == multitask is done
task_multi current_multitask = *queue_iterator;
callback_finish_multitask(current_multitask);
// remove this multitask
queue_iterator = queue_multitasks.erase(queue_iterator);
}
else
{
++queue_iterator;
}
}
// all tasks in the current loop is finished
callback_all_task_finished();
}
LOG_VERBOSE("wait for new task", {});
// wait for new task
{
std::unique_lock<std::mutex> lock(mutex_tasks);
if (queue_tasks.empty()) {
condition_tasks.wait(lock, [&]{
return !queue_tasks.empty();
});
}
}
}
}
//
// functions to manage multitasks
//
// add a multitask by specifying the id of all subtask (subtask is a task_server)
void add_multitask(int multitask_id, std::vector<int>& sub_ids)
{
std::lock_guard<std::mutex> lock(mutex_tasks);
task_multi multi;
multi.id = multitask_id;
std::copy(sub_ids.begin(), sub_ids.end(), std::inserter(multi.subtasks_remaining, multi.subtasks_remaining.end()));
queue_multitasks.push_back(multi);
}
// updatethe remaining subtasks, while appending results to multitask
void update_multitask(int multitask_id, int subtask_id, task_result& result)
{
std::lock_guard<std::mutex> lock(mutex_tasks);
for (auto& multitask : queue_multitasks)
{
if (multitask.id == multitask_id)
{
multitask.subtasks_remaining.erase(subtask_id);
multitask.results.push_back(result);
}
}
}
};
struct llama_server_response {
typedef std::function<void(int, int, task_result&)> callback_multitask_t;
callback_multitask_t callback_update_multitask;
// for keeping track of all tasks waiting for the result
std::set<int> waiting_task_ids;
// the main result queue
std::vector<task_result> queue_results;
std::mutex mutex_results;
std::condition_variable condition_results;
void add_waiting_task_id(int task_id) {
std::unique_lock<std::mutex> lock(mutex_results);
waiting_task_ids.insert(task_id);
}
void remove_waiting_task_id(int task_id) {
std::unique_lock<std::mutex> lock(mutex_results);
waiting_task_ids.erase(task_id);
}
// This function blocks the thread until there is a response for this task_id
task_result recv(int task_id) {
while (true)
{
std::unique_lock<std::mutex> lock(mutex_results);
condition_results.wait(lock, [&]{
return !queue_results.empty();
});
LOG_VERBOSE("condition_results unblock", {});
for (int i = 0; i < (int) queue_results.size(); i++)
{
if (queue_results[i].id == task_id)
{
assert(queue_results[i].multitask_id == -1);
task_result res = queue_results[i];
queue_results.erase(queue_results.begin() + i);
return res;
}
}
}
// should never reach here
}
// Register the function to update multitask
void on_multitask_update(callback_multitask_t callback) {
callback_update_multitask = callback;
}
// Send a new result to a waiting task_id
void send(task_result result) {
std::unique_lock<std::mutex> lock(mutex_results);
LOG_VERBOSE("send new result", {});
for (auto& task_id : waiting_task_ids) {
// LOG_TEE("waiting task id %i \n", task_id);
// for now, tasks that have associated parent multitasks just get erased once multitask picks up the result
if (result.multitask_id == task_id)
{
LOG_VERBOSE("callback_update_multitask", {});
callback_update_multitask(task_id, result.id, result);
continue;
}
if (result.id == task_id)
{
LOG_VERBOSE("queue_results.push_back", {});
queue_results.push_back(result);
condition_results.notify_one();
return;
}
}
}
};
//
// base64 utils (TODO: move to common in the future)
//
static const std::string base64_chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789+/";
static inline bool is_base64(uint8_t c)
{
return (isalnum(c) || (c == '+') || (c == '/'));
}
static inline std::vector<uint8_t> base64_decode(const std::string & encoded_string)
{
int i = 0;
int j = 0;
int in_ = 0;
int in_len = encoded_string.size();
uint8_t char_array_4[4];
uint8_t char_array_3[3];
std::vector<uint8_t> ret;
while (in_len-- && (encoded_string[in_] != '=') && is_base64(encoded_string[in_]))
{
char_array_4[i++] = encoded_string[in_]; in_++;
if (i == 4)
{
for (i = 0; i <4; i++)
{
char_array_4[i] = base64_chars.find(char_array_4[i]);
}
char_array_3[0] = ((char_array_4[0] ) << 2) + ((char_array_4[1] & 0x30) >> 4);
char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];
for (i = 0; (i < 3); i++)
{
ret.push_back(char_array_3[i]);
}
i = 0;
}
}
if (i)
{
for (j = i; j <4; j++)
{
char_array_4[j] = 0;
}
for (j = 0; j <4; j++)
{
char_array_4[j] = base64_chars.find(char_array_4[j]);
}
char_array_3[0] = ((char_array_4[0] ) << 2) + ((char_array_4[1] & 0x30) >> 4);
char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];
for (j = 0; (j < i - 1); j++)
{
ret.push_back(char_array_3[j]);
}
}
return ret;
}

View file

@ -48,7 +48,7 @@ int tts(char *text,int threads, char *dst ) {
// generate audio // generate audio
if (!bark_generate_audio(c, text, threads)) { if (!bark_generate_audio(c, text, threads)) {
fprintf(stderr, "%s: An error occured. If the problem persists, feel free to open an issue to report it.\n", __func__); fprintf(stderr, "%s: An error occurred. If the problem persists, feel free to open an issue to report it.\n", __func__);
return 1; return 1;
} }

View file

@ -20,7 +20,7 @@ CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
# If build type is cublas, then we set -DGGML_CUDA=ON to CMAKE_ARGS automatically # If build type is cublas, then we set -DGGML_CUDA=ON to CMAKE_ARGS automatically
ifeq ($(BUILD_TYPE),cublas) ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DGGML_CUDA=ON CMAKE_ARGS+=-DSD_CUDA=ON
# If build type is openblas then we set -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS # If build type is openblas then we set -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
# to CMAKE_ARGS automatically # to CMAKE_ARGS automatically
else ifeq ($(BUILD_TYPE),openblas) else ifeq ($(BUILD_TYPE),openblas)
@ -30,14 +30,14 @@ else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
# If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ # If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++
else ifeq ($(BUILD_TYPE),hipblas) else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DGGML_HIP=ON CMAKE_ARGS+=-DSD_HIPBLAS=ON
# If it's OSX, DO NOT embed the metal library - -DGGML_METAL_EMBED_LIBRARY=ON requires further investigation # If it's OSX, DO NOT embed the metal library - -DGGML_METAL_EMBED_LIBRARY=ON requires further investigation
# But if it's OSX without metal, disable it here # But if it's OSX without metal, disable it here
else ifeq ($(OS),Darwin) else ifeq ($(OS),Darwin)
ifneq ($(BUILD_TYPE),metal) ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF CMAKE_ARGS+=-DSD_METAL=OFF
else else
CMAKE_ARGS+=-DGGML_METAL=ON CMAKE_ARGS+=-DSD_METAL=ON
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
TARGET+=--target ggml-metal TARGET+=--target ggml-metal
endif endif

View file

@ -74,7 +74,7 @@ func (sd *Whisper) AudioTranscription(opts *pb.TranscriptRequest) (pb.Transcript
context.SetTranslate(true) context.SetTranslate(true)
} }
if err := context.Process(data, nil, nil); err != nil { if err := context.Process(data, nil, nil, nil); err != nil {
return pb.TranscriptResult{}, err return pb.TranscriptResult{}, err
} }

View file

@ -1,17 +0,0 @@
.PHONY: autogptq
autogptq: protogen
bash install.sh
.PHONY: protogen
protogen: backend_pb2_grpc.py backend_pb2.py
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
backend_pb2_grpc.py backend_pb2.py:
python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View file

@ -1,5 +0,0 @@
# Creating a separate environment for the autogptq project
```
make autogptq
```

View file

@ -1,153 +0,0 @@
#!/usr/bin/env python3
from concurrent import futures
import argparse
import signal
import sys
import os
import time
import base64
import grpc
import backend_pb2
import backend_pb2_grpc
from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextGenerationPipeline
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
# Implement the BackendServicer class with the service methods
class BackendServicer(backend_pb2_grpc.BackendServicer):
def Health(self, request, context):
return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
def LoadModel(self, request, context):
try:
device = "cuda:0"
if request.Device != "":
device = request.Device
# support loading local model files
model_path = os.path.join(os.environ.get('MODELS_PATH', './'), request.Model)
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True, trust_remote_code=request.TrustRemoteCode)
# support model `Qwen/Qwen-VL-Chat-Int4`
if "qwen-vl" in request.Model.lower():
self.model_name = "Qwen-VL-Chat"
model = AutoModelForCausalLM.from_pretrained(model_path,
trust_remote_code=request.TrustRemoteCode,
device_map="auto").eval()
else:
model = AutoGPTQForCausalLM.from_quantized(model_path,
model_basename=request.ModelBaseName,
use_safetensors=True,
trust_remote_code=request.TrustRemoteCode,
device=device,
use_triton=request.UseTriton,
quantize_config=None)
self.model = model
self.tokenizer = tokenizer
except Exception as err:
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(message="Model loaded successfully", success=True)
def Predict(self, request, context):
penalty = 1.0
if request.Penalty != 0.0:
penalty = request.Penalty
tokens = 512
if request.Tokens != 0:
tokens = request.Tokens
top_p = 0.95
if request.TopP != 0.0:
top_p = request.TopP
prompt_images = self.recompile_vl_prompt(request)
compiled_prompt = prompt_images[0]
print(f"Prompt: {compiled_prompt}", file=sys.stderr)
# Implement Predict RPC
pipeline = TextGenerationPipeline(
model=self.model,
tokenizer=self.tokenizer,
max_new_tokens=tokens,
temperature=request.Temperature,
top_p=top_p,
repetition_penalty=penalty,
)
t = pipeline(compiled_prompt)[0]["generated_text"]
print(f"generated_text: {t}", file=sys.stderr)
if compiled_prompt in t:
t = t.replace(compiled_prompt, "")
# house keeping. Remove the image files from /tmp folder
for img_path in prompt_images[1]:
try:
os.remove(img_path)
except Exception as e:
print(f"Error removing image file: {img_path}, {e}", file=sys.stderr)
return backend_pb2.Result(message=bytes(t, encoding='utf-8'))
def PredictStream(self, request, context):
# Implement PredictStream RPC
#for reply in some_data_generator():
# yield reply
# Not implemented yet
return self.Predict(request, context)
def recompile_vl_prompt(self, request):
prompt = request.Prompt
image_paths = []
if "qwen-vl" in self.model_name.lower():
# request.Images is an array which contains base64 encoded images. Iterate the request.Images array, decode and save each image to /tmp folder with a random filename.
# Then, save the image file paths to an array "image_paths".
# read "request.Prompt", replace "[img-%d]" with the image file paths in the order they appear in "image_paths". Save the new prompt to "prompt".
for i, img in enumerate(request.Images):
timestamp = str(int(time.time() * 1000)) # Generate timestamp
img_path = f"/tmp/vl-{timestamp}.jpg" # Use timestamp in filename
with open(img_path, "wb") as f:
f.write(base64.b64decode(img))
image_paths.append(img_path)
prompt = prompt.replace(f"[img-{i}]", "<img>" + img_path + "</img>,")
else:
prompt = request.Prompt
return (prompt, image_paths)
def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print("Server started. Listening on: " + address, file=sys.stderr)
# Define the signal handler function
def signal_handler(sig, frame):
print("Received termination signal. Shutting down...")
server.stop(0)
sys.exit(0)
# Set the signal handlers for SIGINT and SIGTERM
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run the gRPC server.")
parser.add_argument(
"--addr", default="localhost:50051", help="The address to bind the server to."
)
args = parser.parse_args()
serve(args.addr)

View file

@ -1,14 +0,0 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
fi
installRequirements

View file

@ -1,2 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118

View file

@ -1 +0,0 @@
torch==2.4.1

View file

@ -1,2 +0,0 @@
--extra-index-url https://download.pytorch.org/whl/rocm6.0
torch==2.4.1+rocm6.0

View file

@ -1,6 +0,0 @@
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
setuptools

View file

@ -1,6 +0,0 @@
accelerate
auto-gptq==0.7.1
grpcio==1.71.0
protobuf
certifi
transformers

View file

@ -1,4 +0,0 @@
#!/bin/bash
source $(dirname $0)/../common/libbackend.sh
startBackend $@

View file

@ -1,6 +0,0 @@
#!/bin/bash
set -e
source $(dirname $0)/../common/libbackend.sh
runUnittests

View file

@ -61,7 +61,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.Result(success=True) return backend_pb2.Result(success=True)
def serve(address): def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address) server.add_insecure_port(address)
server.start() server.start()

View file

@ -1,4 +1,4 @@
bark==0.1.5 bark==0.1.5
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
certifi certifi

View file

@ -1,3 +1,3 @@
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
grpcio-tools grpcio-tools

View file

@ -86,7 +86,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.Result(success=True) return backend_pb2.Result(success=True)
def serve(address): def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address) server.add_insecure_port(address)
server.start() server.start()

View file

@ -1,4 +1,4 @@
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
certifi certifi
packaging==24.1 packaging==24.1

View file

@ -168,9 +168,13 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
# We are storing all the options in a dict so we can use it later when # We are storing all the options in a dict so we can use it later when
# generating the images # generating the images
for opt in options: for opt in options:
if ":" not in opt:
continue
key, value = opt.split(":") key, value = opt.split(":")
self.options[key] = value self.options[key] = value
print(f"Options: {self.options}", file=sys.stderr)
local = False local = False
modelFile = request.Model modelFile = request.Model
@ -522,7 +526,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def serve(address): def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address) server.add_insecure_port(address)
server.start() server.start()

View file

@ -1,5 +1,5 @@
setuptools setuptools
grpcio==1.71.0 grpcio==1.72.0
pillow pillow
protobuf protobuf
certifi certifi

View file

@ -105,7 +105,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
def serve(address): def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address) server.add_insecure_port(address)
server.start() server.start()

View file

@ -1,4 +1,4 @@
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
certifi certifi
wheel wheel

View file

@ -62,7 +62,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.TranscriptResult(segments=resultSegments, text=text) return backend_pb2.TranscriptResult(segments=resultSegments, text=text)
def serve(address): def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address) server.add_insecure_port(address)
server.start() server.start()

View file

@ -1,3 +1,3 @@
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
grpcio-tools grpcio-tools

View file

@ -99,7 +99,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.Result(success=True) return backend_pb2.Result(success=True)
def serve(address): def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address) server.add_insecure_port(address)
server.start() server.start()

View file

@ -1,4 +1,4 @@
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
phonemizer phonemizer
scipy scipy

View file

@ -91,7 +91,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.RerankResult(usage=usage, results=results) return backend_pb2.RerankResult(usage=usage, results=results)
def serve(address): def serve(address):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address) server.add_insecure_port(address)
server.start() server.start()

View file

@ -1,3 +1,3 @@
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
certifi certifi

View file

@ -559,7 +559,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
async def serve(address): async def serve(address):
# Start asyncio gRPC server # Start asyncio gRPC server
server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
# Add the servicer to the server # Add the servicer to the server
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
# Bind the server to the address # Bind the server to the address

View file

@ -1,4 +1,4 @@
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
certifi certifi
setuptools setuptools

View file

@ -194,27 +194,40 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
await iterations.aclose() await iterations.aclose()
async def _predict(self, request, context, streaming=False): async def _predict(self, request, context, streaming=False):
# Build the sampling parameters
# NOTE: this must stay in sync with the vllm backend
request_to_sampling_params = {
"N": "n",
"PresencePenalty": "presence_penalty",
"FrequencyPenalty": "frequency_penalty",
"RepetitionPenalty": "repetition_penalty",
"Temperature": "temperature",
"TopP": "top_p",
"TopK": "top_k",
"MinP": "min_p",
"Seed": "seed",
"StopPrompts": "stop",
"StopTokenIds": "stop_token_ids",
"BadWords": "bad_words",
"IncludeStopStrInOutput": "include_stop_str_in_output",
"IgnoreEOS": "ignore_eos",
"Tokens": "max_tokens",
"MinTokens": "min_tokens",
"Logprobs": "logprobs",
"PromptLogprobs": "prompt_logprobs",
"SkipSpecialTokens": "skip_special_tokens",
"SpacesBetweenSpecialTokens": "spaces_between_special_tokens",
"TruncatePromptTokens": "truncate_prompt_tokens",
"GuidedDecoding": "guided_decoding",
}
# Build sampling parameters
sampling_params = SamplingParams(top_p=0.9, max_tokens=200) sampling_params = SamplingParams(top_p=0.9, max_tokens=200)
if request.TopP != 0:
sampling_params.top_p = request.TopP for request_field, param_field in request_to_sampling_params.items():
if request.Tokens > 0: if hasattr(request, request_field):
sampling_params.max_tokens = request.Tokens value = getattr(request, request_field)
if request.Temperature != 0: if value not in (None, 0, [], False):
sampling_params.temperature = request.Temperature setattr(sampling_params, param_field, value)
if request.TopK != 0:
sampling_params.top_k = request.TopK
if request.PresencePenalty != 0:
sampling_params.presence_penalty = request.PresencePenalty
if request.FrequencyPenalty != 0:
sampling_params.frequency_penalty = request.FrequencyPenalty
if request.StopPrompts:
sampling_params.stop = request.StopPrompts
if request.IgnoreEOS:
sampling_params.ignore_eos = request.IgnoreEOS
if request.Seed != 0:
sampling_params.seed = request.Seed
# Extract image paths and process images # Extract image paths and process images
prompt = request.Prompt prompt = request.Prompt
@ -320,7 +333,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
async def serve(address): async def serve(address):
# Start asyncio gRPC server # Start asyncio gRPC server
server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS)) server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
('grpc.max_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
('grpc.max_receive_message_length', 50 * 1024 * 1024), # 50MB
])
# Add the servicer to the server # Add the servicer to the server
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server) backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
# Bind the server to the address # Bind the server to the address

View file

@ -1,4 +1,4 @@
grpcio==1.71.0 grpcio==1.72.0
protobuf protobuf
certifi certifi
setuptools setuptools

View file

@ -75,6 +75,53 @@ class TestBackendServicer(unittest.TestCase):
finally: finally:
self.tearDown() self.tearDown()
def test_sampling_params(self):
"""
This method tests if all sampling parameters are correctly processed
NOTE: this does NOT test for correctness, just that we received a compatible response
"""
try:
self.setUp()
with grpc.insecure_channel("localhost:50051") as channel:
stub = backend_pb2_grpc.BackendStub(channel)
response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/opt-125m"))
self.assertTrue(response.success)
req = backend_pb2.PredictOptions(
Prompt="The capital of France is",
TopP=0.8,
Tokens=50,
Temperature=0.7,
TopK=40,
PresencePenalty=0.1,
FrequencyPenalty=0.2,
RepetitionPenalty=1.1,
MinP=0.05,
Seed=42,
StopPrompts=["\n"],
StopTokenIds=[50256],
BadWords=["badword"],
IncludeStopStrInOutput=True,
IgnoreEOS=True,
MinTokens=5,
Logprobs=5,
PromptLogprobs=5,
SkipSpecialTokens=True,
SpacesBetweenSpecialTokens=True,
TruncatePromptTokens=10,
GuidedDecoding=True,
N=2,
)
resp = stub.Predict(req)
self.assertIsNotNone(resp.message)
self.assertIsNotNone(resp.logprobs)
except Exception as err:
print(err)
self.fail("sampling params service failed")
finally:
self.tearDown()
def test_embedding(self): def test_embedding(self):
""" """
This method tests if the embeddings are generated successfully This method tests if the embeddings are generated successfully

View file

@ -43,18 +43,12 @@ func New(opts ...config.AppOption) (*Application, error) {
if err != nil { if err != nil {
return nil, fmt.Errorf("unable to create ModelPath: %q", err) return nil, fmt.Errorf("unable to create ModelPath: %q", err)
} }
if options.ImageDir != "" { if options.GeneratedContentDir != "" {
err := os.MkdirAll(options.ImageDir, 0750) err := os.MkdirAll(options.GeneratedContentDir, 0750)
if err != nil { if err != nil {
return nil, fmt.Errorf("unable to create ImageDir: %q", err) return nil, fmt.Errorf("unable to create ImageDir: %q", err)
} }
} }
if options.AudioDir != "" {
err := os.MkdirAll(options.AudioDir, 0750)
if err != nil {
return nil, fmt.Errorf("unable to create AudioDir: %q", err)
}
}
if options.UploadDir != "" { if options.UploadDir != "" {
err := os.MkdirAll(options.UploadDir, 0750) err := os.MkdirAll(options.UploadDir, 0750)
if err != nil { if err != nil {

View file

@ -99,7 +99,7 @@ func grpcModelOpts(c config.BackendConfig) *pb.ModelOptions {
mmap = *c.MMap mmap = *c.MMap
} }
ctxSize := 1024 ctxSize := 4096
if c.ContextSize != nil { if c.ContextSize != nil {
ctxSize = *c.ContextSize ctxSize = *c.ContextSize
} }
@ -184,11 +184,6 @@ func grpcModelOpts(c config.BackendConfig) *pb.ModelOptions {
MainGPU: c.MainGPU, MainGPU: c.MainGPU,
Threads: int32(*c.Threads), Threads: int32(*c.Threads),
TensorSplit: c.TensorSplit, TensorSplit: c.TensorSplit,
// AutoGPTQ
ModelBaseName: c.AutoGPTQ.ModelBaseName,
Device: c.AutoGPTQ.Device,
UseTriton: c.AutoGPTQ.Triton,
UseFastTokenizer: c.AutoGPTQ.UseFastTokenizer,
// RWKV // RWKV
Tokenizer: c.Tokenizer, Tokenizer: c.Tokenizer,
} }

View file

@ -35,12 +35,17 @@ func SoundGeneration(
return "", nil, fmt.Errorf("could not load sound generation model") return "", nil, fmt.Errorf("could not load sound generation model")
} }
if err := os.MkdirAll(appConfig.AudioDir, 0750); err != nil { if err := os.MkdirAll(appConfig.GeneratedContentDir, 0750); err != nil {
return "", nil, fmt.Errorf("failed creating audio directory: %s", err) return "", nil, fmt.Errorf("failed creating audio directory: %s", err)
} }
fileName := utils.GenerateUniqueFileName(appConfig.AudioDir, "sound_generation", ".wav") audioDir := filepath.Join(appConfig.GeneratedContentDir, "audio")
filePath := filepath.Join(appConfig.AudioDir, fileName) if err := os.MkdirAll(audioDir, 0750); err != nil {
return "", nil, fmt.Errorf("failed creating audio directory: %s", err)
}
fileName := utils.GenerateUniqueFileName(audioDir, "sound_generation", ".wav")
filePath := filepath.Join(audioDir, fileName)
res, err := soundGenModel.SoundGeneration(context.Background(), &proto.SoundGenerationRequest{ res, err := soundGenModel.SoundGeneration(context.Background(), &proto.SoundGenerationRequest{
Text: text, Text: text,

View file

@ -32,12 +32,13 @@ func ModelTTS(
return "", nil, fmt.Errorf("could not load tts model %q", backendConfig.Model) return "", nil, fmt.Errorf("could not load tts model %q", backendConfig.Model)
} }
if err := os.MkdirAll(appConfig.AudioDir, 0750); err != nil { audioDir := filepath.Join(appConfig.GeneratedContentDir, "audio")
if err := os.MkdirAll(audioDir, 0750); err != nil {
return "", nil, fmt.Errorf("failed creating audio directory: %s", err) return "", nil, fmt.Errorf("failed creating audio directory: %s", err)
} }
fileName := utils.GenerateUniqueFileName(appConfig.AudioDir, "tts", ".wav") fileName := utils.GenerateUniqueFileName(audioDir, "tts", ".wav")
filePath := filepath.Join(appConfig.AudioDir, fileName) filePath := filepath.Join(audioDir, fileName)
// We join the model name to the model path here. This seems to only be done for TTS and is HIGHLY suspect. // We join the model name to the model path here. This seems to only be done for TTS and is HIGHLY suspect.
// This should be addressed in a follow up PR soon. // This should be addressed in a follow up PR soon.

36
core/backend/video.go Normal file
View file

@ -0,0 +1,36 @@
package backend
import (
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/pkg/grpc/proto"
model "github.com/mudler/LocalAI/pkg/model"
)
func VideoGeneration(height, width int32, prompt, startImage, endImage, dst string, loader *model.ModelLoader, backendConfig config.BackendConfig, appConfig *config.ApplicationConfig) (func() error, error) {
opts := ModelOptions(backendConfig, appConfig)
inferenceModel, err := loader.Load(
opts...,
)
if err != nil {
return nil, err
}
defer loader.Close()
fn := func() error {
_, err := inferenceModel.GenerateVideo(
appConfig.Context,
&proto.GenerateVideoRequest{
Height: height,
Width: width,
Prompt: prompt,
StartImage: startImage,
EndImage: endImage,
Dst: dst,
})
return err
}
return fn, nil
}

View file

@ -1,11 +1,13 @@
package cliContext package cliContext
import "embed" import (
rice "github.com/GeertJohan/go.rice"
)
type Context struct { type Context struct {
Debug bool `env:"LOCALAI_DEBUG,DEBUG" default:"false" hidden:"" help:"DEPRECATED, use --log-level=debug instead. Enable debug logging"` Debug bool `env:"LOCALAI_DEBUG,DEBUG" default:"false" hidden:"" help:"DEPRECATED, use --log-level=debug instead. Enable debug logging"`
LogLevel *string `env:"LOCALAI_LOG_LEVEL" enum:"error,warn,info,debug,trace" help:"Set the level of logs to output [${enum}]"` LogLevel *string `env:"LOCALAI_LOG_LEVEL" enum:"error,warn,info,debug,trace" help:"Set the level of logs to output [${enum}]"`
// This field is not a command line argument/flag, the struct tag excludes it from the parsed CLI // This field is not a command line argument/flag, the struct tag excludes it from the parsed CLI
BackendAssets embed.FS `kong:"-"` BackendAssets *rice.Box `kong:"-"`
} }

View file

@ -21,8 +21,7 @@ type RunCMD struct {
ModelsPath string `env:"LOCALAI_MODELS_PATH,MODELS_PATH" type:"path" default:"${basepath}/models" help:"Path containing models used for inferencing" group:"storage"` ModelsPath string `env:"LOCALAI_MODELS_PATH,MODELS_PATH" type:"path" default:"${basepath}/models" help:"Path containing models used for inferencing" group:"storage"`
BackendAssetsPath string `env:"LOCALAI_BACKEND_ASSETS_PATH,BACKEND_ASSETS_PATH" type:"path" default:"/tmp/localai/backend_data" help:"Path used to extract libraries that are required by some of the backends in runtime" group:"storage"` BackendAssetsPath string `env:"LOCALAI_BACKEND_ASSETS_PATH,BACKEND_ASSETS_PATH" type:"path" default:"/tmp/localai/backend_data" help:"Path used to extract libraries that are required by some of the backends in runtime" group:"storage"`
ImagePath string `env:"LOCALAI_IMAGE_PATH,IMAGE_PATH" type:"path" default:"/tmp/generated/images" help:"Location for images generated by backends (e.g. stablediffusion)" group:"storage"` GeneratedContentPath string `env:"LOCALAI_GENERATED_CONTENT_PATH,GENERATED_CONTENT_PATH" type:"path" default:"/tmp/generated/content" help:"Location for generated content (e.g. images, audio, videos)" group:"storage"`
AudioPath string `env:"LOCALAI_AUDIO_PATH,AUDIO_PATH" type:"path" default:"/tmp/generated/audio" help:"Location for audio generated by backends (e.g. piper)" group:"storage"`
UploadPath string `env:"LOCALAI_UPLOAD_PATH,UPLOAD_PATH" type:"path" default:"/tmp/localai/upload" help:"Path to store uploads from files api" group:"storage"` UploadPath string `env:"LOCALAI_UPLOAD_PATH,UPLOAD_PATH" type:"path" default:"/tmp/localai/upload" help:"Path to store uploads from files api" group:"storage"`
ConfigPath string `env:"LOCALAI_CONFIG_PATH,CONFIG_PATH" default:"/tmp/localai/config" group:"storage"` ConfigPath string `env:"LOCALAI_CONFIG_PATH,CONFIG_PATH" default:"/tmp/localai/config" group:"storage"`
LocalaiConfigDir string `env:"LOCALAI_CONFIG_DIR" type:"path" default:"${basepath}/configuration" help:"Directory for dynamic loading of certain configuration files (currently api_keys.json and external_backends.json)" group:"storage"` LocalaiConfigDir string `env:"LOCALAI_CONFIG_DIR" type:"path" default:"${basepath}/configuration" help:"Directory for dynamic loading of certain configuration files (currently api_keys.json and external_backends.json)" group:"storage"`
@ -47,7 +46,7 @@ type RunCMD struct {
CSRF bool `env:"LOCALAI_CSRF" help:"Enables fiber CSRF middleware" group:"api"` CSRF bool `env:"LOCALAI_CSRF" help:"Enables fiber CSRF middleware" group:"api"`
UploadLimit int `env:"LOCALAI_UPLOAD_LIMIT,UPLOAD_LIMIT" default:"15" help:"Default upload-limit in MB" group:"api"` UploadLimit int `env:"LOCALAI_UPLOAD_LIMIT,UPLOAD_LIMIT" default:"15" help:"Default upload-limit in MB" group:"api"`
APIKeys []string `env:"LOCALAI_API_KEY,API_KEY" help:"List of API Keys to enable API authentication. When this is set, all the requests must be authenticated with one of these API keys" group:"api"` APIKeys []string `env:"LOCALAI_API_KEY,API_KEY" help:"List of API Keys to enable API authentication. When this is set, all the requests must be authenticated with one of these API keys" group:"api"`
DisableWebUI bool `env:"LOCALAI_DISABLE_WEBUI,DISABLE_WEBUI" default:"false" help:"Disable webui" group:"api"` DisableWebUI bool `env:"LOCALAI_DISABLE_WEBUI,DISABLE_WEBUI" default:"false" help:"Disables the web user interface. When set to true, the server will only expose API endpoints without serving the web interface" group:"api"`
DisablePredownloadScan bool `env:"LOCALAI_DISABLE_PREDOWNLOAD_SCAN" help:"If true, disables the best-effort security scanner before downloading any files." group:"hardening" default:"false"` DisablePredownloadScan bool `env:"LOCALAI_DISABLE_PREDOWNLOAD_SCAN" help:"If true, disables the best-effort security scanner before downloading any files." group:"hardening" default:"false"`
OpaqueErrors bool `env:"LOCALAI_OPAQUE_ERRORS" default:"false" help:"If true, all error responses are replaced with blank 500 errors. This is intended only for hardening against information leaks and is normally not recommended." group:"hardening"` OpaqueErrors bool `env:"LOCALAI_OPAQUE_ERRORS" default:"false" help:"If true, all error responses are replaced with blank 500 errors. This is intended only for hardening against information leaks and is normally not recommended." group:"hardening"`
UseSubtleKeyComparison bool `env:"LOCALAI_SUBTLE_KEY_COMPARISON" default:"false" help:"If true, API Key validation comparisons will be performed using constant-time comparisons rather than simple equality. This trades off performance on each request for resiliancy against timing attacks." group:"hardening"` UseSubtleKeyComparison bool `env:"LOCALAI_SUBTLE_KEY_COMPARISON" default:"false" help:"If true, API Key validation comparisons will be performed using constant-time comparisons rather than simple equality. This trades off performance on each request for resiliancy against timing attacks." group:"hardening"`
@ -81,8 +80,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
config.WithModelPath(r.ModelsPath), config.WithModelPath(r.ModelsPath),
config.WithContextSize(r.ContextSize), config.WithContextSize(r.ContextSize),
config.WithDebug(zerolog.GlobalLevel() <= zerolog.DebugLevel), config.WithDebug(zerolog.GlobalLevel() <= zerolog.DebugLevel),
config.WithImageDir(r.ImagePath), config.WithGeneratedContentDir(r.GeneratedContentPath),
config.WithAudioDir(r.AudioPath),
config.WithUploadDir(r.UploadPath), config.WithUploadDir(r.UploadPath),
config.WithConfigsDir(r.ConfigPath), config.WithConfigsDir(r.ConfigPath),
config.WithDynamicConfigDir(r.LocalaiConfigDir), config.WithDynamicConfigDir(r.LocalaiConfigDir),

View file

@ -70,7 +70,7 @@ func (t *SoundGenerationCMD) Run(ctx *cliContext.Context) error {
opts := &config.ApplicationConfig{ opts := &config.ApplicationConfig{
ModelPath: t.ModelsPath, ModelPath: t.ModelsPath,
Context: context.Background(), Context: context.Background(),
AudioDir: outputDir, GeneratedContentDir: outputDir,
AssetsDestination: t.BackendAssetsPath, AssetsDestination: t.BackendAssetsPath,
ExternalGRPCBackends: externalBackends, ExternalGRPCBackends: externalBackends,
} }

View file

@ -36,10 +36,10 @@ func (t *TTSCMD) Run(ctx *cliContext.Context) error {
text := strings.Join(t.Text, " ") text := strings.Join(t.Text, " ")
opts := &config.ApplicationConfig{ opts := &config.ApplicationConfig{
ModelPath: t.ModelsPath, ModelPath: t.ModelsPath,
Context: context.Background(), Context: context.Background(),
AudioDir: outputDir, GeneratedContentDir: outputDir,
AssetsDestination: t.BackendAssetsPath, AssetsDestination: t.BackendAssetsPath,
} }
ml := model.NewModelLoader(opts.ModelPath, opts.SingleBackend) ml := model.NewModelLoader(opts.ModelPath, opts.SingleBackend)

View file

@ -7,11 +7,11 @@ import (
"github.com/rs/zerolog/log" "github.com/rs/zerolog/log"
gguf "github.com/gpustack/gguf-parser-go"
cliContext "github.com/mudler/LocalAI/core/cli/context" cliContext "github.com/mudler/LocalAI/core/cli/context"
"github.com/mudler/LocalAI/core/config" "github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery" "github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/pkg/downloader" "github.com/mudler/LocalAI/pkg/downloader"
gguf "github.com/thxcode/gguf-parser-go"
) )
type UtilCMD struct { type UtilCMD struct {
@ -51,7 +51,7 @@ func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error {
log.Info(). log.Info().
Any("eosTokenID", f.Tokenizer().EOSTokenID). Any("eosTokenID", f.Tokenizer().EOSTokenID).
Any("bosTokenID", f.Tokenizer().BOSTokenID). Any("bosTokenID", f.Tokenizer().BOSTokenID).
Any("modelName", f.Model().Name). Any("modelName", f.Metadata().Name).
Any("architecture", f.Architecture().Architecture).Msgf("GGUF file loaded: %s", u.Args[0]) Any("architecture", f.Architecture().Architecture).Msgf("GGUF file loaded: %s", u.Args[0])
log.Info().Any("tokenizer", fmt.Sprintf("%+v", f.Tokenizer())).Msg("Tokenizer") log.Info().Any("tokenizer", fmt.Sprintf("%+v", f.Tokenizer())).Msg("Tokenizer")

View file

@ -2,11 +2,11 @@ package config
import ( import (
"context" "context"
"embed"
"encoding/json" "encoding/json"
"regexp" "regexp"
"time" "time"
rice "github.com/GeertJohan/go.rice"
"github.com/mudler/LocalAI/pkg/xsysinfo" "github.com/mudler/LocalAI/pkg/xsysinfo"
"github.com/rs/zerolog/log" "github.com/rs/zerolog/log"
) )
@ -19,20 +19,21 @@ type ApplicationConfig struct {
UploadLimitMB, Threads, ContextSize int UploadLimitMB, Threads, ContextSize int
F16 bool F16 bool
Debug bool Debug bool
ImageDir string GeneratedContentDir string
AudioDir string
UploadDir string ConfigsDir string
ConfigsDir string UploadDir string
DynamicConfigsDir string
DynamicConfigsDirPollInterval time.Duration DynamicConfigsDir string
CORS bool DynamicConfigsDirPollInterval time.Duration
CSRF bool CORS bool
PreloadJSONModels string CSRF bool
PreloadModelsFromPath string PreloadJSONModels string
CORSAllowOrigins string PreloadModelsFromPath string
ApiKeys []string CORSAllowOrigins string
P2PToken string ApiKeys []string
P2PNetworkID string P2PToken string
P2PNetworkID string
DisableWebUI bool DisableWebUI bool
EnforcePredownloadScans bool EnforcePredownloadScans bool
@ -46,7 +47,7 @@ type ApplicationConfig struct {
Galleries []Gallery Galleries []Gallery
BackendAssets embed.FS BackendAssets *rice.Box
AssetsDestination string AssetsDestination string
ExternalGRPCBackends map[string]string ExternalGRPCBackends map[string]string
@ -197,7 +198,7 @@ func WithBackendAssetsOutput(out string) AppOption {
} }
} }
func WithBackendAssets(f embed.FS) AppOption { func WithBackendAssets(f *rice.Box) AppOption {
return func(o *ApplicationConfig) { return func(o *ApplicationConfig) {
o.BackendAssets = f o.BackendAssets = f
} }
@ -279,15 +280,9 @@ func WithDebug(debug bool) AppOption {
} }
} }
func WithAudioDir(audioDir string) AppOption { func WithGeneratedContentDir(generatedContentDir string) AppOption {
return func(o *ApplicationConfig) { return func(o *ApplicationConfig) {
o.AudioDir = audioDir o.GeneratedContentDir = generatedContentDir
}
}
func WithImageDir(imageDir string) AppOption {
return func(o *ApplicationConfig) {
o.ImageDir = imageDir
} }
} }

View file

@ -50,9 +50,6 @@ type BackendConfig struct {
// LLM configs (GPT4ALL, Llama.cpp, ...) // LLM configs (GPT4ALL, Llama.cpp, ...)
LLMConfig `yaml:",inline"` LLMConfig `yaml:",inline"`
// AutoGPTQ specifics
AutoGPTQ AutoGPTQ `yaml:"autogptq"`
// Diffusers // Diffusers
Diffusers Diffusers `yaml:"diffusers"` Diffusers Diffusers `yaml:"diffusers"`
Step int `yaml:"step"` Step int `yaml:"step"`
@ -176,14 +173,6 @@ type LimitMMPerPrompt struct {
LimitAudioPerPrompt int `yaml:"audio"` LimitAudioPerPrompt int `yaml:"audio"`
} }
// AutoGPTQ is a struct that holds the configuration specific to the AutoGPTQ backend
type AutoGPTQ struct {
ModelBaseName string `yaml:"model_base_name"`
Device string `yaml:"device"`
Triton bool `yaml:"triton"`
UseFastTokenizer bool `yaml:"use_fast_tokenizer"`
}
// TemplateConfig is a struct that holds the configuration of the templating system // TemplateConfig is a struct that holds the configuration of the templating system
type TemplateConfig struct { type TemplateConfig struct {
// Chat is the template used in the chat completion endpoint // Chat is the template used in the chat completion endpoint
@ -315,9 +304,6 @@ func (cfg *BackendConfig) SetDefaults(opts ...ConfigLoaderOption) {
defaultTFZ := 1.0 defaultTFZ := 1.0
defaultZero := 0 defaultZero := 0
// Try to offload all GPU layers (if GPU is found)
defaultHigh := 99999999
trueV := true trueV := true
falseV := false falseV := false
@ -377,9 +363,6 @@ func (cfg *BackendConfig) SetDefaults(opts ...ConfigLoaderOption) {
if cfg.MirostatTAU == nil { if cfg.MirostatTAU == nil {
cfg.MirostatTAU = &defaultMirostatTAU cfg.MirostatTAU = &defaultMirostatTAU
} }
if cfg.NGPULayers == nil {
cfg.NGPULayers = &defaultHigh
}
if cfg.LowVRAM == nil { if cfg.LowVRAM == nil {
cfg.LowVRAM = &falseV cfg.LowVRAM = &falseV
@ -447,18 +430,19 @@ func (c *BackendConfig) HasTemplate() bool {
type BackendConfigUsecases int type BackendConfigUsecases int
const ( const (
FLAG_ANY BackendConfigUsecases = 0b00000000000 FLAG_ANY BackendConfigUsecases = 0b000000000000
FLAG_CHAT BackendConfigUsecases = 0b00000000001 FLAG_CHAT BackendConfigUsecases = 0b000000000001
FLAG_COMPLETION BackendConfigUsecases = 0b00000000010 FLAG_COMPLETION BackendConfigUsecases = 0b000000000010
FLAG_EDIT BackendConfigUsecases = 0b00000000100 FLAG_EDIT BackendConfigUsecases = 0b000000000100
FLAG_EMBEDDINGS BackendConfigUsecases = 0b00000001000 FLAG_EMBEDDINGS BackendConfigUsecases = 0b000000001000
FLAG_RERANK BackendConfigUsecases = 0b00000010000 FLAG_RERANK BackendConfigUsecases = 0b000000010000
FLAG_IMAGE BackendConfigUsecases = 0b00000100000 FLAG_IMAGE BackendConfigUsecases = 0b000000100000
FLAG_TRANSCRIPT BackendConfigUsecases = 0b00001000000 FLAG_TRANSCRIPT BackendConfigUsecases = 0b000001000000
FLAG_TTS BackendConfigUsecases = 0b00010000000 FLAG_TTS BackendConfigUsecases = 0b000010000000
FLAG_SOUND_GENERATION BackendConfigUsecases = 0b00100000000 FLAG_SOUND_GENERATION BackendConfigUsecases = 0b000100000000
FLAG_TOKENIZE BackendConfigUsecases = 0b01000000000 FLAG_TOKENIZE BackendConfigUsecases = 0b001000000000
FLAG_VAD BackendConfigUsecases = 0b10000000000 FLAG_VAD BackendConfigUsecases = 0b010000000000
FLAG_VIDEO BackendConfigUsecases = 0b100000000000
// Common Subsets // Common Subsets
FLAG_LLM BackendConfigUsecases = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT FLAG_LLM BackendConfigUsecases = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT
@ -479,6 +463,7 @@ func GetAllBackendConfigUsecases() map[string]BackendConfigUsecases {
"FLAG_TOKENIZE": FLAG_TOKENIZE, "FLAG_TOKENIZE": FLAG_TOKENIZE,
"FLAG_VAD": FLAG_VAD, "FLAG_VAD": FLAG_VAD,
"FLAG_LLM": FLAG_LLM, "FLAG_LLM": FLAG_LLM,
"FLAG_VIDEO": FLAG_VIDEO,
} }
} }
@ -543,6 +528,17 @@ func (c *BackendConfig) GuessUsecases(u BackendConfigUsecases) bool {
return false return false
} }
}
if (u & FLAG_VIDEO) == FLAG_VIDEO {
videoBackends := []string{"diffusers", "stablediffusion"}
if !slices.Contains(videoBackends, c.Backend) {
return false
}
if c.Backend == "diffusers" && c.Diffusers.PipelineType == "" {
return false
}
} }
if (u & FLAG_RERANK) == FLAG_RERANK { if (u & FLAG_RERANK) == FLAG_RERANK {
if c.Backend != "rerankers" { if c.Backend != "rerankers" {
@ -555,7 +551,7 @@ func (c *BackendConfig) GuessUsecases(u BackendConfigUsecases) bool {
} }
} }
if (u & FLAG_TTS) == FLAG_TTS { if (u & FLAG_TTS) == FLAG_TTS {
ttsBackends := []string{"piper", "transformers-musicgen", "parler-tts"} ttsBackends := []string{"bark-cpp", "parler-tts", "piper", "transformers-musicgen"}
if !slices.Contains(ttsBackends, c.Backend) { if !slices.Contains(ttsBackends, c.Backend) {
return false return false
} }

View file

@ -3,9 +3,10 @@ package config
import ( import (
"strings" "strings"
"github.com/mudler/LocalAI/pkg/xsysinfo"
"github.com/rs/zerolog/log" "github.com/rs/zerolog/log"
gguf "github.com/thxcode/gguf-parser-go" gguf "github.com/gpustack/gguf-parser-go"
) )
type familyType uint8 type familyType uint8
@ -23,6 +24,7 @@ const (
const ( const (
defaultContextSize = 1024 defaultContextSize = 1024
defaultNGPULayers = 99999999
) )
type settingsConfig struct { type settingsConfig struct {
@ -147,7 +149,7 @@ var knownTemplates = map[string]familyType{
func guessGGUFFromFile(cfg *BackendConfig, f *gguf.GGUFFile, defaultCtx int) { func guessGGUFFromFile(cfg *BackendConfig, f *gguf.GGUFFile, defaultCtx int) {
if defaultCtx == 0 && cfg.ContextSize == nil { if defaultCtx == 0 && cfg.ContextSize == nil {
ctxSize := f.EstimateLLaMACppUsage().ContextSize ctxSize := f.EstimateLLaMACppRun().ContextSize
if ctxSize > 0 { if ctxSize > 0 {
cSize := int(ctxSize) cSize := int(ctxSize)
cfg.ContextSize = &cSize cfg.ContextSize = &cSize
@ -157,6 +159,46 @@ func guessGGUFFromFile(cfg *BackendConfig, f *gguf.GGUFFile, defaultCtx int) {
} }
} }
// GPU options
if cfg.Options == nil {
if xsysinfo.HasGPU("nvidia") || xsysinfo.HasGPU("amd") {
cfg.Options = []string{"gpu"}
}
}
// vram estimation
vram, err := xsysinfo.TotalAvailableVRAM()
if err != nil {
log.Error().Msgf("guessDefaultsFromFile(TotalAvailableVRAM): %s", err)
} else if vram > 0 {
estimate, err := xsysinfo.EstimateGGUFVRAMUsage(f, vram)
if err != nil {
log.Error().Msgf("guessDefaultsFromFile(EstimateGGUFVRAMUsage): %s", err)
} else {
if estimate.IsFullOffload {
log.Warn().Msgf("guessDefaultsFromFile: %s", "full offload is recommended")
}
if estimate.EstimatedVRAM > vram {
log.Warn().Msgf("guessDefaultsFromFile: %s", "estimated VRAM usage is greater than available VRAM")
}
if cfg.NGPULayers == nil && estimate.EstimatedLayers > 0 {
log.Debug().Msgf("guessDefaultsFromFile: %d layers estimated", estimate.EstimatedLayers)
cfg.NGPULayers = &estimate.EstimatedLayers
}
}
}
if cfg.NGPULayers == nil {
// we assume we want to offload all layers
defaultHigh := defaultNGPULayers
cfg.NGPULayers = &defaultHigh
}
log.Debug().Any("NGPULayers", cfg.NGPULayers).Msgf("guessDefaultsFromFile: %s", "NGPULayers set")
// template estimations
if cfg.HasTemplate() { if cfg.HasTemplate() {
// nothing to guess here // nothing to guess here
log.Debug().Any("name", cfg.Name).Msgf("guessDefaultsFromFile: %s", "template already set") log.Debug().Any("name", cfg.Name).Msgf("guessDefaultsFromFile: %s", "template already set")
@ -166,12 +208,12 @@ func guessGGUFFromFile(cfg *BackendConfig, f *gguf.GGUFFile, defaultCtx int) {
log.Debug(). log.Debug().
Any("eosTokenID", f.Tokenizer().EOSTokenID). Any("eosTokenID", f.Tokenizer().EOSTokenID).
Any("bosTokenID", f.Tokenizer().BOSTokenID). Any("bosTokenID", f.Tokenizer().BOSTokenID).
Any("modelName", f.Model().Name). Any("modelName", f.Metadata().Name).
Any("architecture", f.Architecture().Architecture).Msgf("Model file loaded: %s", cfg.ModelFileName()) Any("architecture", f.Architecture().Architecture).Msgf("Model file loaded: %s", cfg.ModelFileName())
// guess the name // guess the name
if cfg.Name == "" { if cfg.Name == "" {
cfg.Name = f.Model().Name cfg.Name = f.Metadata().Name
} }
family := identifyFamily(f) family := identifyFamily(f)
@ -207,6 +249,7 @@ func guessGGUFFromFile(cfg *BackendConfig, f *gguf.GGUFFile, defaultCtx int) {
cfg.TemplateConfig.JinjaTemplate = true cfg.TemplateConfig.JinjaTemplate = true
cfg.TemplateConfig.ChatMessage = chatTemplate.ValueString() cfg.TemplateConfig.ChatMessage = chatTemplate.ValueString()
} }
} }
func identifyFamily(f *gguf.GGUFFile) familyType { func identifyFamily(f *gguf.GGUFFile) familyType {
@ -231,7 +274,7 @@ func identifyFamily(f *gguf.GGUFFile) familyType {
commandR := arch == "command-r" && eosTokenID == 255001 commandR := arch == "command-r" && eosTokenID == 255001
qwen2 := arch == "qwen2" qwen2 := arch == "qwen2"
phi3 := arch == "phi-3" phi3 := arch == "phi-3"
gemma := strings.HasPrefix(arch, "gemma") || strings.Contains(strings.ToLower(f.Model().Name), "gemma") gemma := strings.HasPrefix(arch, "gemma") || strings.Contains(strings.ToLower(f.Metadata().Name), "gemma")
deepseek2 := arch == "deepseek2" deepseek2 := arch == "deepseek2"
switch { switch {

View file

@ -4,8 +4,8 @@ import (
"os" "os"
"path/filepath" "path/filepath"
gguf "github.com/gpustack/gguf-parser-go"
"github.com/rs/zerolog/log" "github.com/rs/zerolog/log"
gguf "github.com/thxcode/gguf-parser-go"
) )
func guessDefaultsFromFile(cfg *BackendConfig, modelPath string, defaultCtx int) { func guessDefaultsFromFile(cfg *BackendConfig, modelPath string, defaultCtx int) {

View file

@ -5,6 +5,8 @@ import (
"errors" "errors"
"fmt" "fmt"
"net/http" "net/http"
"os"
"path/filepath"
"github.com/dave-gray101/v2keyauth" "github.com/dave-gray101/v2keyauth"
"github.com/mudler/LocalAI/pkg/utils" "github.com/mudler/LocalAI/pkg/utils"
@ -153,12 +155,19 @@ func API(application *application.Application) (*fiber.App, error) {
Browse: true, Browse: true,
})) }))
if application.ApplicationConfig().ImageDir != "" { if application.ApplicationConfig().GeneratedContentDir != "" {
router.Static("/generated-images", application.ApplicationConfig().ImageDir) os.MkdirAll(application.ApplicationConfig().GeneratedContentDir, 0750)
} audioPath := filepath.Join(application.ApplicationConfig().GeneratedContentDir, "audio")
imagePath := filepath.Join(application.ApplicationConfig().GeneratedContentDir, "images")
videoPath := filepath.Join(application.ApplicationConfig().GeneratedContentDir, "videos")
if application.ApplicationConfig().AudioDir != "" { os.MkdirAll(audioPath, 0750)
router.Static("/generated-audio", application.ApplicationConfig().AudioDir) os.MkdirAll(imagePath, 0750)
os.MkdirAll(videoPath, 0750)
router.Static("/generated-audio", audioPath)
router.Static("/generated-images", imagePath)
router.Static("/generated-videos", videoPath)
} }
// Auth is applied to _all_ endpoints. No exceptions. Filtering out endpoints to bypass is the role of the Filter property of the KeyAuth Configuration // Auth is applied to _all_ endpoints. No exceptions. Filtering out endpoints to bypass is the role of the Filter property of the KeyAuth Configuration

View file

@ -3,7 +3,6 @@ package http_test
import ( import (
"bytes" "bytes"
"context" "context"
"embed"
"encoding/json" "encoding/json"
"fmt" "fmt"
"io" "io"
@ -24,6 +23,7 @@ import (
. "github.com/onsi/gomega" . "github.com/onsi/gomega"
"gopkg.in/yaml.v3" "gopkg.in/yaml.v3"
rice "github.com/GeertJohan/go.rice"
openaigo "github.com/otiai10/openaigo" openaigo "github.com/otiai10/openaigo"
"github.com/sashabaranov/go-openai" "github.com/sashabaranov/go-openai"
"github.com/sashabaranov/go-openai/jsonschema" "github.com/sashabaranov/go-openai/jsonschema"
@ -264,8 +264,15 @@ func getRequest(url string, header http.Header) (error, int, []byte) {
const bertEmbeddingsURL = `https://gist.githubusercontent.com/mudler/0a080b166b87640e8644b09c2aee6e3b/raw/f0e8c26bb72edc16d9fbafbfd6638072126ff225/bert-embeddings-gallery.yaml` const bertEmbeddingsURL = `https://gist.githubusercontent.com/mudler/0a080b166b87640e8644b09c2aee6e3b/raw/f0e8c26bb72edc16d9fbafbfd6638072126ff225/bert-embeddings-gallery.yaml`
//go:embed backend-assets/* var backendAssets *rice.Box
var backendAssets embed.FS
func init() {
var err error
backendAssets, err = rice.FindBox("backend-assets")
if err != nil {
panic(err)
}
}
var _ = Describe("API test", func() { var _ = Describe("API test", func() {
@ -629,8 +636,7 @@ var _ = Describe("API test", func() {
application, err := application.New( application, err := application.New(
append(commonOpts, append(commonOpts,
config.WithContext(c), config.WithContext(c),
config.WithAudioDir(tmpdir), config.WithGeneratedContentDir(tmpdir),
config.WithImageDir(tmpdir),
config.WithGalleries(galleries), config.WithGalleries(galleries),
config.WithModelPath(modelDir), config.WithModelPath(modelDir),
config.WithBackendAssets(backendAssets), config.WithBackendAssets(backendAssets),

View file

@ -32,7 +32,7 @@ func TTSEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfi
return fiber.ErrBadRequest return fiber.ErrBadRequest
} }
log.Debug().Str("modelName", input.ModelID).Msg("elevenlabs TTS request recieved") log.Debug().Str("modelName", input.ModelID).Msg("elevenlabs TTS request received")
filePath, _, err := backend.ModelTTS(input.Text, voiceID, input.LanguageCode, ml, appConfig, *cfg) filePath, _, err := backend.ModelTTS(input.Text, voiceID, input.LanguageCode, ml, appConfig, *cfg)
if err != nil { if err != nil {

View file

@ -30,7 +30,7 @@ func JINARerankEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, a
return fiber.ErrBadRequest return fiber.ErrBadRequest
} }
log.Debug().Str("model", input.Model).Msg("JINA Rerank Request recieved") log.Debug().Str("model", input.Model).Msg("JINA Rerank Request received")
request := &proto.RerankRequest{ request := &proto.RerankRequest{
Query: input.Query, Query: input.Query,

View file

@ -120,6 +120,7 @@ func (mgs *ModelGalleryEndpointService) ListModelFromGalleryEndpoint() func(c *f
models, err := gallery.AvailableGalleryModels(mgs.galleries, mgs.modelPath) models, err := gallery.AvailableGalleryModels(mgs.galleries, mgs.modelPath)
if err != nil { if err != nil {
log.Error().Err(err).Msg("could not list models from galleries")
return err return err
} }

View file

@ -34,7 +34,7 @@ func TTSEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfi
return fiber.ErrBadRequest return fiber.ErrBadRequest
} }
log.Debug().Str("model", input.Model).Msg("LocalAI TTS Request recieved") log.Debug().Str("model", input.Model).Msg("LocalAI TTS Request received")
if cfg.Backend == "" { if cfg.Backend == "" {
if input.Backend != "" { if input.Backend != "" {

View file

@ -28,7 +28,7 @@ func VADEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfi
return fiber.ErrBadRequest return fiber.ErrBadRequest
} }
log.Debug().Str("model", input.Model).Msg("LocalAI VAD Request recieved") log.Debug().Str("model", input.Model).Msg("LocalAI VAD Request received")
resp, err := backend.VAD(input, c.Context(), ml, appConfig, *cfg) resp, err := backend.VAD(input, c.Context(), ml, appConfig, *cfg)

View file

@ -0,0 +1,205 @@
package localai
import (
"bufio"
"encoding/base64"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"path/filepath"
"strings"
"time"
"github.com/google/uuid"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/backend"
"github.com/gofiber/fiber/v2"
model "github.com/mudler/LocalAI/pkg/model"
"github.com/rs/zerolog/log"
)
func downloadFile(url string) (string, error) {
// Get the data
resp, err := http.Get(url)
if err != nil {
return "", err
}
defer resp.Body.Close()
// Create the file
out, err := os.CreateTemp("", "video")
if err != nil {
return "", err
}
defer out.Close()
// Write the body to file
_, err = io.Copy(out, resp.Body)
return out.Name(), err
}
//
/*
*
curl http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "A cute baby sea otter",
"n": 1,
"size": "512x512"
}'
*
*/
// VideoEndpoint
// @Summary Creates a video given a prompt.
// @Param request body schema.OpenAIRequest true "query params"
// @Success 200 {object} schema.OpenAIResponse "Response"
// @Router /video [post]
func VideoEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
return func(c *fiber.Ctx) error {
input, ok := c.Locals(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.VideoRequest)
if !ok || input.Model == "" {
log.Error().Msg("Video Endpoint - Invalid Input")
return fiber.ErrBadRequest
}
config, ok := c.Locals(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.BackendConfig)
if !ok || config == nil {
log.Error().Msg("Video Endpoint - Invalid Config")
return fiber.ErrBadRequest
}
src := ""
if input.StartImage != "" {
var fileData []byte
var err error
// check if input.File is an URL, if so download it and save it
// to a temporary file
if strings.HasPrefix(input.StartImage, "http://") || strings.HasPrefix(input.StartImage, "https://") {
out, err := downloadFile(input.StartImage)
if err != nil {
return fmt.Errorf("failed downloading file:%w", err)
}
defer os.RemoveAll(out)
fileData, err = os.ReadFile(out)
if err != nil {
return fmt.Errorf("failed reading file:%w", err)
}
} else {
// base 64 decode the file and write it somewhere
// that we will cleanup
fileData, err = base64.StdEncoding.DecodeString(input.StartImage)
if err != nil {
return err
}
}
// Create a temporary file
outputFile, err := os.CreateTemp(appConfig.GeneratedContentDir, "b64")
if err != nil {
return err
}
// write the base64 result
writer := bufio.NewWriter(outputFile)
_, err = writer.Write(fileData)
if err != nil {
outputFile.Close()
return err
}
outputFile.Close()
src = outputFile.Name()
defer os.RemoveAll(src)
}
log.Debug().Msgf("Parameter Config: %+v", config)
switch config.Backend {
case "stablediffusion":
config.Backend = model.StableDiffusionGGMLBackend
case "":
config.Backend = model.StableDiffusionGGMLBackend
}
width := input.Width
height := input.Height
if width == 0 {
width = 512
}
if height == 0 {
height = 512
}
b64JSON := input.ResponseFormat == "b64_json"
tempDir := ""
if !b64JSON {
tempDir = filepath.Join(appConfig.GeneratedContentDir, "videos")
}
// Create a temporary file
outputFile, err := os.CreateTemp(tempDir, "b64")
if err != nil {
return err
}
outputFile.Close()
// TODO: use mime type to determine the extension
output := outputFile.Name() + ".mp4"
// Rename the temporary file
err = os.Rename(outputFile.Name(), output)
if err != nil {
return err
}
baseURL := c.BaseURL()
fn, err := backend.VideoGeneration(height, width, input.Prompt, src, input.EndImage, output, ml, *config, appConfig)
if err != nil {
return err
}
if err := fn(); err != nil {
return err
}
item := &schema.Item{}
if b64JSON {
defer os.RemoveAll(output)
data, err := os.ReadFile(output)
if err != nil {
return err
}
item.B64JSON = base64.StdEncoding.EncodeToString(data)
} else {
base := filepath.Base(output)
item.URL = baseURL + "/generated-videos/" + base
}
id := uuid.New().String()
created := int(time.Now().Unix())
resp := &schema.OpenAIResponse{
ID: id,
Created: created,
Data: []schema.Item{*item},
}
jsonResult, _ := json.Marshal(resp)
log.Debug().Msgf("Response: %s", jsonResult)
// Return the prediction in the response body
return c.JSON(resp)
}
}

View file

@ -108,7 +108,7 @@ func ImageEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appCon
} }
// Create a temporary file // Create a temporary file
outputFile, err := os.CreateTemp(appConfig.ImageDir, "b64") outputFile, err := os.CreateTemp(appConfig.GeneratedContentDir, "b64")
if err != nil { if err != nil {
return err return err
} }
@ -184,7 +184,7 @@ func ImageEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appCon
tempDir := "" tempDir := ""
if !b64JSON { if !b64JSON {
tempDir = appConfig.ImageDir tempDir = filepath.Join(appConfig.GeneratedContentDir, "images")
} }
// Create a temporary file // Create a temporary file
outputFile, err := os.CreateTemp(tempDir, "b64") outputFile, err := os.CreateTemp(tempDir, "b64")
@ -192,6 +192,7 @@ func ImageEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appCon
return err return err
} }
outputFile.Close() outputFile.Close()
output := outputFile.Name() + ".png" output := outputFile.Name() + ".png"
// Rename the temporary file // Rename the temporary file
err = os.Rename(outputFile.Name(), output) err = os.Rename(outputFile.Name(), output)

View file

@ -203,18 +203,10 @@ func mergeOpenAIRequestAndBackendConfig(config *config.BackendConfig, input *sch
config.Diffusers.ClipSkip = input.ClipSkip config.Diffusers.ClipSkip = input.ClipSkip
} }
if input.ModelBaseName != "" {
config.AutoGPTQ.ModelBaseName = input.ModelBaseName
}
if input.NegativePromptScale != 0 { if input.NegativePromptScale != 0 {
config.NegativePromptScale = input.NegativePromptScale config.NegativePromptScale = input.NegativePromptScale
} }
if input.UseFastTokenizer {
config.UseFastTokenizer = input.UseFastTokenizer
}
if input.NegativePrompt != "" { if input.NegativePrompt != "" {
config.NegativePrompt = input.NegativePrompt config.NegativePrompt = input.NegativePrompt
} }

View file

@ -59,6 +59,11 @@ func RegisterLocalAIRoutes(router *fiber.App,
router.Get("/metrics", localai.LocalAIMetricsEndpoint()) router.Get("/metrics", localai.LocalAIMetricsEndpoint())
} }
router.Post("/video",
requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_VIDEO)),
requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.VideoRequest) }),
localai.VideoEndpoint(cl, ml, appConfig))
// Backend Statistics Module // Backend Statistics Module
// TODO: Should these use standard middlewares? Refactor later, they are extremely simple. // TODO: Should these use standard middlewares? Refactor later, they are extremely simple.
backendMonitorService := services.NewBackendMonitorService(ml, cl, appConfig) // Split out for now backendMonitorService := services.NewBackendMonitorService(ml, cl, appConfig) // Split out for now

View file

@ -131,7 +131,17 @@ func RegisterUIRoutes(app *fiber.App,
page := c.Query("page") page := c.Query("page")
items := c.Query("items") items := c.Query("items")
models, _ := gallery.AvailableGalleryModels(appConfig.Galleries, appConfig.ModelPath) models, err := gallery.AvailableGalleryModels(appConfig.Galleries, appConfig.ModelPath)
if err != nil {
log.Error().Err(err).Msg("could not list models from galleries")
return c.Status(fiber.StatusInternalServerError).Render("views/error", fiber.Map{
"Title": "LocalAI - Models",
"BaseURL": utils.BaseURL(c),
"Version": internal.PrintableVersion(),
"ErrorCode": "500",
"ErrorMessage": err.Error(),
})
}
// Get all available tags // Get all available tags
allTags := map[string]struct{}{} allTags := map[string]struct{}{}

View file

@ -115,6 +115,7 @@ async function sendTextToChatGPT(text) {
const response = await fetch('v1/chat/completions', { const response = await fetch('v1/chat/completions', {
method: 'POST', method: 'POST',
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ body: JSON.stringify({
model: getModel(), model: getModel(),
messages: conversationHistory messages: conversationHistory

View file

@ -0,0 +1,56 @@
<!DOCTYPE html>
<html lang="en">
{{template "views/partials/head" .}}
<body class="bg-gradient-to-br from-gray-900 to-gray-950 text-gray-200">
<div class="flex flex-col min-h-screen">
{{template "views/partials/navbar" .}}
<div class="container mx-auto px-4 py-8 flex-grow">
<!-- Error Section -->
<div class="bg-gradient-to-r from-blue-900/30 to-indigo-900/30 rounded-2xl shadow-xl p-8 mb-10">
<div class="max-w-4xl mx-auto text-center">
<div class="mb-6 text-6xl text-blue-400">
<i class="fas fa-exclamation-circle"></i>
</div>
<h1 class="text-4xl md:text-5xl font-bold text-white mb-4">
<span class="bg-clip-text text-transparent bg-gradient-to-r from-blue-400 to-indigo-400">
{{if .ErrorCode}}{{.ErrorCode}}{{else}}Error{{end}}
</span>
</h1>
<p class="text-xl text-gray-300 mb-6">{{if .ErrorMessage}}{{.ErrorMessage}}{{else}}An unexpected error occurred{{end}}</p>
<div class="flex flex-wrap justify-center gap-4">
<a href="./"
class="group flex items-center bg-blue-600 hover:bg-blue-700 text-white py-2 px-6 rounded-lg transition duration-300 ease-in-out transform hover:scale-105 hover:shadow-lg">
<i class="fas fa-home mr-2"></i>
<span>Return Home</span>
<i class="fas fa-arrow-right opacity-0 group-hover:opacity-100 group-hover:translate-x-2 ml-2 transition-all duration-300"></i>
</a>
<a href="browse"
class="group flex items-center bg-indigo-600 hover:bg-indigo-700 text-white py-2 px-6 rounded-lg transition duration-300 ease-in-out transform hover:scale-105 hover:shadow-lg">
<i class="fas fa-images mr-2"></i>
<span>Browse Gallery</span>
<i class="fas fa-arrow-right opacity-0 group-hover:opacity-100 group-hover:translate-x-2 ml-2 transition-all duration-300"></i>
</a>
</div>
</div>
</div>
<!-- Additional Information -->
<div class="bg-gray-800/50 border border-gray-700/50 rounded-xl p-8 shadow-md backdrop-blur-sm">
<div class="text-center max-w-3xl mx-auto">
<div class="inline-flex items-center justify-center w-16 h-16 rounded-full bg-yellow-500/20 mb-4">
<i class="text-yellow-400 text-2xl fa-solid fa-triangle-exclamation"></i>
</div>
<h2 class="text-2xl md:text-3xl font-semibold text-gray-100 mb-4">Need help?</h2>
<p class="text-lg text-gray-300 mb-6">Visit our <a class="text-blue-400 hover:text-blue-300 underline underline-offset-2" href="browse">🖼️ Gallery</a> or check the <a href="https://localai.io/basics/getting_started/" class="text-blue-400 hover:text-blue-300 underline underline-offset-2"> <i class="fa-solid fa-book"></i> Getting started documentation</a></p>
</div>
</div>
</div>
{{template "views/partials/footer" .}}
</div>
</body>
</html>

View file

@ -24,6 +24,20 @@ type GalleryResponse struct {
StatusURL string `json:"status"` StatusURL string `json:"status"`
} }
type VideoRequest struct {
BasicModelRequest
Prompt string `json:"prompt" yaml:"prompt"`
StartImage string `json:"start_image" yaml:"start_image"`
EndImage string `json:"end_image" yaml:"end_image"`
Width int32 `json:"width" yaml:"width"`
Height int32 `json:"height" yaml:"height"`
NumFrames int32 `json:"num_frames" yaml:"num_frames"`
FPS int32 `json:"fps" yaml:"fps"`
Seed int32 `json:"seed" yaml:"seed"`
CFGScale float32 `json:"cfg_scale" yaml:"cfg_scale"`
ResponseFormat string `json:"response_format" yaml:"response_format"`
}
// @Description TTS request body // @Description TTS request body
type TTSRequest struct { type TTSRequest struct {
BasicModelRequest BasicModelRequest

View file

@ -202,7 +202,6 @@ type OpenAIRequest struct {
Backend string `json:"backend" yaml:"backend"` Backend string `json:"backend" yaml:"backend"`
// AutoGPTQ
ModelBaseName string `json:"model_base_name" yaml:"model_base_name"` ModelBaseName string `json:"model_base_name" yaml:"model_base_name"`
} }

View file

@ -41,8 +41,6 @@ type PredictionOptions struct {
RopeFreqBase float32 `json:"rope_freq_base" yaml:"rope_freq_base"` RopeFreqBase float32 `json:"rope_freq_base" yaml:"rope_freq_base"`
RopeFreqScale float32 `json:"rope_freq_scale" yaml:"rope_freq_scale"` RopeFreqScale float32 `json:"rope_freq_scale" yaml:"rope_freq_scale"`
NegativePromptScale float32 `json:"negative_prompt_scale" yaml:"negative_prompt_scale"` NegativePromptScale float32 `json:"negative_prompt_scale" yaml:"negative_prompt_scale"`
// AutoGPTQ
UseFastTokenizer bool `json:"use_fast_tokenizer" yaml:"use_fast_tokenizer"`
// Diffusers // Diffusers
ClipSkip int `json:"clip_skip" yaml:"clip_skip"` ClipSkip int `json:"clip_skip" yaml:"clip_skip"`

View file

@ -268,14 +268,6 @@ yarn_ext_factor: 0
yarn_attn_factor: 0 yarn_attn_factor: 0
yarn_beta_fast: 0 yarn_beta_fast: 0
yarn_beta_slow: 0 yarn_beta_slow: 0
# AutoGPT-Q settings, for configurations specific to GPT models.
autogptq:
model_base_name: "" # Base name of the model.
device: "" # Device to run the model on.
triton: false # Whether to use Triton Inference Server.
use_fast_tokenizer: false # Whether to use a fast tokenizer for quicker processing.
# configuration for diffusers model # configuration for diffusers model
diffusers: diffusers:
cuda: false # Whether to use CUDA cuda: false # Whether to use CUDA
@ -489,8 +481,7 @@ In the help text below, BASEPATH is the location that local-ai is being executed
|-----------|---------|-------------|----------------------| |-----------|---------|-------------|----------------------|
| --models-path | BASEPATH/models | Path containing models used for inferencing | $LOCALAI_MODELS_PATH | | --models-path | BASEPATH/models | Path containing models used for inferencing | $LOCALAI_MODELS_PATH |
| --backend-assets-path |/tmp/localai/backend_data | Path used to extract libraries that are required by some of the backends in runtime | $LOCALAI_BACKEND_ASSETS_PATH | | --backend-assets-path |/tmp/localai/backend_data | Path used to extract libraries that are required by some of the backends in runtime | $LOCALAI_BACKEND_ASSETS_PATH |
| --image-path | /tmp/generated/images | Location for images generated by backends (e.g. stablediffusion) | $LOCALAI_IMAGE_PATH | | --generated-content-path | /tmp/generated/content | Location for assets generated by backends (e.g. stablediffusion) | $LOCALAI_GENERATED_CONTENT_PATH |
| --audio-path | /tmp/generated/audio | Location for audio generated by backends (e.g. piper) | $LOCALAI_AUDIO_PATH |
| --upload-path | /tmp/localai/upload | Path to store uploads from files api | $LOCALAI_UPLOAD_PATH | | --upload-path | /tmp/localai/upload | Path to store uploads from files api | $LOCALAI_UPLOAD_PATH |
| --config-path | /tmp/localai/config | | $LOCALAI_CONFIG_PATH | | --config-path | /tmp/localai/config | | $LOCALAI_CONFIG_PATH |
| --localai-config-dir | BASEPATH/configuration | Directory for dynamic loading of certain configuration files (currently api_keys.json and external_backends.json) | $LOCALAI_CONFIG_DIR | | --localai-config-dir | BASEPATH/configuration | Directory for dynamic loading of certain configuration files (currently api_keys.json and external_backends.json) | $LOCALAI_CONFIG_DIR |
@ -523,6 +514,7 @@ In the help text below, BASEPATH is the location that local-ai is being executed
| --upload-limit | 15 | Default upload-limit in MB | $LOCALAI_UPLOAD_LIMIT | | --upload-limit | 15 | Default upload-limit in MB | $LOCALAI_UPLOAD_LIMIT |
| --api-keys | API-KEYS,... | List of API Keys to enable API authentication. When this is set, all the requests must be authenticated with one of these API keys | $LOCALAI_API_KEY | | --api-keys | API-KEYS,... | List of API Keys to enable API authentication. When this is set, all the requests must be authenticated with one of these API keys | $LOCALAI_API_KEY |
| --disable-welcome | | Disable welcome pages | $LOCALAI_DISABLE_WELCOME | | --disable-welcome | | Disable welcome pages | $LOCALAI_DISABLE_WELCOME |
| --disable-webui | false | Disables the web user interface. When set to true, the server will only expose API endpoints without serving the web interface | $LOCALAI_DISABLE_WEBUI |
| --machine-tag | | If not empty - put that string to Machine-Tag header in each response. Useful to track response from different machines using multiple P2P federated nodes | $LOCALAI_MACHINE_TAG | | --machine-tag | | If not empty - put that string to Machine-Tag header in each response. Useful to track response from different machines using multiple P2P federated nodes | $LOCALAI_MACHINE_TAG |
#### Backend Flags #### Backend Flags

View file

@ -23,8 +23,9 @@ List of the Environment Variables:
|----------------------|--------------------------------------------------------------| |----------------------|--------------------------------------------------------------|
| **DOCKER_INSTALL** | Set to "true" to enable the installation of Docker images. | | **DOCKER_INSTALL** | Set to "true" to enable the installation of Docker images. |
| **USE_AIO** | Set to "true" to use the all-in-one LocalAI Docker image. | | **USE_AIO** | Set to "true" to use the all-in-one LocalAI Docker image. |
| **USE_EXTRAS** | Set to "true" to use images with extra Python dependencies. |
| **USE_VULKAN** | Set to "true" to use Vulkan GPU support. |
| **API_KEY** | Specify an API key for accessing LocalAI, if required. | | **API_KEY** | Specify an API key for accessing LocalAI, if required. |
| **CORE_IMAGES** | Set to "true" to download core LocalAI images. |
| **PORT** | Specifies the port on which LocalAI will run (default is 8080). | | **PORT** | Specifies the port on which LocalAI will run (default is 8080). |
| **THREADS** | Number of processor threads the application should use. Defaults to the number of logical cores minus one. | | **THREADS** | Number of processor threads the application should use. Defaults to the number of logical cores minus one. |
| **VERSION** | Specifies the version of LocalAI to install. Defaults to the latest available version. | | **VERSION** | Specifies the version of LocalAI to install. Defaults to the latest available version. |
@ -34,4 +35,20 @@ List of the Environment Variables:
| **FEDERATED** | Set to "true" to share the instance with the federation (p2p token is required see [documentation]({{%relref "docs/features/distributed_inferencing" %}})) | | **FEDERATED** | Set to "true" to share the instance with the federation (p2p token is required see [documentation]({{%relref "docs/features/distributed_inferencing" %}})) |
| **FEDERATED_SERVER** | Set to "true" to run the instance as a federation server which forwards requests to the federation (p2p token is required see [documentation]({{%relref "docs/features/distributed_inferencing" %}})) | | **FEDERATED_SERVER** | Set to "true" to run the instance as a federation server which forwards requests to the federation (p2p token is required see [documentation]({{%relref "docs/features/distributed_inferencing" %}})) |
## Image Selection
The installer will automatically detect your GPU and select the appropriate image. By default, it uses the standard images without extra Python dependencies. You can customize the image selection using the following environment variables:
- `USE_EXTRAS=true`: Use images with extra Python dependencies (larger images, ~17GB)
- `USE_AIO=true`: Use all-in-one images that include all dependencies
- `USE_VULKAN=true`: Use Vulkan GPU support instead of vendor-specific GPU support
## Uninstallation
To uninstall, run:
```
curl https://localai.io/install.sh | sh -s -- --uninstall
```
We are looking into improving the installer, and as this is a first iteration any feedback is welcome! Open up an [issue](https://github.com/mudler/LocalAI/issues/new/choose) if something doesn't work for you! We are looking into improving the installer, and as this is a first iteration any feedback is welcome! Open up an [issue](https://github.com/mudler/LocalAI/issues/new/choose) if something doesn't work for you!

View file

@ -57,12 +57,14 @@ diffusers:
Requirement: nvidia-container-toolkit (installation instructions [1](https://www.server-world.info/en/note?os=Ubuntu_22.04&p=nvidia&f=2) [2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) Requirement: nvidia-container-toolkit (installation instructions [1](https://www.server-world.info/en/note?os=Ubuntu_22.04&p=nvidia&f=2) [2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
If using a system with SELinux, ensure you have the policies installed, such as those [provided by nvidia](https://github.com/NVIDIA/dgx-selinux/)
To check what CUDA version do you need, you can either run `nvidia-smi` or `nvcc --version`. To check what CUDA version do you need, you can either run `nvidia-smi` or `nvcc --version`.
Alternatively, you can also check nvidia-smi with docker: Alternatively, you can also check nvidia-smi with docker:
``` ```
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi docker run --runtime=nvidia --rm nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
``` ```
To use CUDA, use the images with the `cublas` tag, for example. To use CUDA, use the images with the `cublas` tag, for example.
@ -147,7 +149,6 @@ The devices in the following list have been tested with `hipblas` images running
| diffusers | yes | Radeon VII (gfx906) | | diffusers | yes | Radeon VII (gfx906) |
| piper | yes | Radeon VII (gfx906) | | piper | yes | Radeon VII (gfx906) |
| whisper | no | none | | whisper | no | none |
| autogptq | no | none |
| bark | no | none | | bark | no | none |
| coqui | no | none | | coqui | no | none |
| transformers | no | none | | transformers | no | none |
@ -279,3 +280,36 @@ docker run --rm -ti --device /dev/dri -p 8080:8080 -e DEBUG=true -e MODELS_PATH=
``` ```
Note also that sycl does have a known issue to hang with `mmap: true`. You have to disable it in the model configuration if explicitly enabled. Note also that sycl does have a known issue to hang with `mmap: true`. You have to disable it in the model configuration if explicitly enabled.
## Vulkan acceleration
### Requirements
If using nvidia, follow the steps in the [CUDA](#cudanvidia-acceleration) section to configure your docker runtime to allow access to the GPU.
### Container images
To use Vulkan, use the images with the `vulkan` tag, for example `{{< version >}}-vulkan-ffmpeg-core`.
#### Example
To run LocalAI with Docker and Vulkan, you can use the following command as an example:
```bash
docker run -p 8080:8080 -e DEBUG=true -v $PWD/models:/build/models localai/localai:latest-vulkan-ffmpeg-core
```
### Notes
In addition to the commands to run LocalAI normally, you need to specify additonal flags to pass the GPU hardware to the container.
These flags are the same as the sections above, depending on the hardware, for [nvidia](#cudanvidia-acceleration), [AMD](#rocmamd-acceleration) or [Intel](#intel-acceleration-sycl).
If you have mixed hardware, you can pass flags for multiple GPUs, for example:
```bash
docker run -p 8080:8080 -e DEBUG=true -v $PWD/models:/build/models \
--gpus=all \ # nvidia passthrough
--device /dev/dri --device /dev/kfd \ # AMD/Intel passthrough
localai/localai:latest-vulkan-ffmpeg-core
```

View file

@ -74,49 +74,9 @@ curl http://localhost:8080/v1/models
## Backends ## Backends
### AutoGPTQ
[AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) is an easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
#### Prerequisites
This is an extra backend - in the container images is already available and there is nothing to do for the setup.
If you are building LocalAI locally, you need to install [AutoGPTQ manually](https://github.com/PanQiWei/AutoGPTQ#quick-installation).
#### Model setup
The models are automatically downloaded from `huggingface` if not present the first time. It is possible to define models via `YAML` config file, or just by querying the endpoint with the `huggingface` repository model name. For example, create a `YAML` config file in `models/`:
```
name: orca
backend: autogptq
model_base_name: "orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order"
parameters:
model: "TheBloke/orca_mini_v2_13b-GPTQ"
# ...
```
Test with:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "orca",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.1
}'
```
### RWKV ### RWKV
A full example on how to run a rwkv model is in the [examples](https://github.com/go-skynet/LocalAI/tree/master/examples/rwkv). RWKV support is available through llama.cpp (see below)
Note: rwkv models needs to specify the backend `rwkv` in the YAML config files and have an associated tokenizer along that needs to be provided with it:
```
36464540 -rw-r--r-- 1 mudler mudler 1.2G May 3 10:51 rwkv_small
36464543 -rw-r--r-- 1 mudler mudler 2.4M May 3 10:51 rwkv_small.tokenizer.json
```
### llama.cpp ### llama.cpp

View file

@ -9,7 +9,7 @@ ico = "rocket_launch"
### Build ### Build
LocalAI can be built as a container image or as a single, portable binary. Note that the some model architectures might require Python libraries, which are not included in the binary. The binary contains only the core backends written in Go and C++. LocalAI can be built as a container image or as a single, portable binary. Note that some model architectures might require Python libraries, which are not included in the binary. The binary contains only the core backends written in Go and C++.
LocalAI's extensible architecture allows you to add your own backends, which can be written in any language, and as such the container images contains also the Python dependencies to run all the available backends (for example, in order to run backends like __Diffusers__ that allows to generate images and videos from text). LocalAI's extensible architecture allows you to add your own backends, which can be written in any language, and as such the container images contains also the Python dependencies to run all the available backends (for example, in order to run backends like __Diffusers__ that allows to generate images and videos from text).
@ -189,7 +189,7 @@ sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
- If completions are slow, ensure that `gpu-layers` in your model yaml matches the number of layers from the model in use (or simply use a high number such as 256). - If completions are slow, ensure that `gpu-layers` in your model yaml matches the number of layers from the model in use (or simply use a high number such as 256).
- If you a get a compile error: `error: only virtual member functions can be marked 'final'`, reinstall all the necessary brew packages, clean the build, and try again. - If you get a compile error: `error: only virtual member functions can be marked 'final'`, reinstall all the necessary brew packages, clean the build, and try again.
``` ```
# reinstall build dependencies # reinstall build dependencies

View file

@ -39,7 +39,7 @@ Before you begin, ensure you have a container engine installed if you are not us
## All-in-one images ## All-in-one images
All-In-One images are images that come pre-configured with a set of models and backends to fully leverage almost all the LocalAI featureset. These images are available for both CPU and GPU environments. The AIO images are designed to be easy to use and requires no configuration. Models configuration can be found [here](https://github.com/mudler/LocalAI/tree/master/aio) separated by size. All-In-One images are images that come pre-configured with a set of models and backends to fully leverage almost all the LocalAI featureset. These images are available for both CPU and GPU environments. The AIO images are designed to be easy to use and require no configuration. Models configuration can be found [here](https://github.com/mudler/LocalAI/tree/master/aio) separated by size.
In the AIO images there are models configured with the names of OpenAI models, however, they are really backed by Open Source models. You can find the table below In the AIO images there are models configured with the names of OpenAI models, however, they are really backed by Open Source models. You can find the table below
@ -150,7 +150,7 @@ The AIO Images are inheriting the same environment variables as the base images
Standard container images do not have pre-installed models. Standard container images do not have pre-installed models.
Images are available with and without python dependencies. Note that images with python dependencies are bigger (in order of 17GB). Images are available with and without python dependencies (images with the `extras` suffix). Note that images with python dependencies are bigger (in order of 17GB).
Images with `core` in the tag are smaller and do not contain any python dependencies. Images with `core` in the tag are smaller and do not contain any python dependencies.
@ -160,10 +160,8 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| Description | Quay | Docker Hub | | Description | Quay | Docker Hub |
| --- | --- |-----------------------------------------------| | --- | --- |-----------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master` | `localai/localai:master` | | Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master` | `localai/localai:master` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-cpu` | `localai/localai:latest-cpu` | | Latest tag | `quay.io/go-skynet/local-ai:latest` | `localai/localai:latest` |
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}` | `localai/localai:{{< version >}}` | | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}` | `localai/localai:{{< version >}}` |
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-ffmpeg` | `localai/localai:{{< version >}}-ffmpeg` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-ffmpeg-core` | `localai/localai:{{< version >}}-ffmpeg-core` |
{{% /tab %}} {{% /tab %}}
@ -172,10 +170,9 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| Description | Quay | Docker Hub | | Description | Quay | Docker Hub |
| --- | --- |-------------------------------------------------------------| | --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-cublas-cuda11` | `localai/localai:master-cublas-cuda11` | | Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-cublas-cuda11` | `localai/localai:master-cublas-cuda11` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-nvidia-cuda-11` | `localai/localai:latest-gpu-nvidia-cuda-11` | | Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-nvidia-cuda-11` | `localai/localai:latest-gpu-nvidia-cuda-11` |
| Latest tag with extras | `quay.io/go-skynet/local-ai:latest-gpu-nvidia-cuda-11-extras` | `localai/localai:latest-gpu-nvidia-cuda-11-extras` |
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11` | `localai/localai:{{< version >}}-cublas-cuda11` | | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11` | `localai/localai:{{< version >}}-cublas-cuda11` |
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11-ffmpeg` | `localai/localai:{{< version >}}-cublas-cuda11-ffmpeg` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11-ffmpeg-core` | `localai/localai:{{< version >}}-cublas-cuda11-ffmpeg-core` |
{{% /tab %}} {{% /tab %}}
@ -185,9 +182,8 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| --- | --- |-------------------------------------------------------------| | --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-cublas-cuda12` | `localai/localai:master-cublas-cuda12` | | Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-cublas-cuda12` | `localai/localai:master-cublas-cuda12` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-nvidia-cuda-12` | `localai/localai:latest-gpu-nvidia-cuda-12` | | Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-nvidia-cuda-12` | `localai/localai:latest-gpu-nvidia-cuda-12` |
| Latest tag with extras | `quay.io/go-skynet/local-ai:latest-gpu-nvidia-cuda-12-extras` | `localai/localai:latest-gpu-nvidia-cuda-12-extras` |
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12` | `localai/localai:{{< version >}}-cublas-cuda12` | | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12` | `localai/localai:{{< version >}}-cublas-cuda12` |
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12-ffmpeg` | `localai/localai:{{< version >}}-cublas-cuda12-ffmpeg` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12-ffmpeg-core` | `localai/localai:{{< version >}}-cublas-cuda12-ffmpeg-core` |
{{% /tab %}} {{% /tab %}}
@ -197,9 +193,8 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| --- | --- |-------------------------------------------------------------| | --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-sycl-f16` | `localai/localai:master-sycl-f16` | | Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-sycl-f16` | `localai/localai:master-sycl-f16` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-intel-f16` | `localai/localai:latest-gpu-intel-f16` | | Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-intel-f16` | `localai/localai:latest-gpu-intel-f16` |
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16-core` | `localai/localai:{{< version >}}-sycl-f16-core` | | Latest tag with extras | `quay.io/go-skynet/local-ai:latest-gpu-intel-f16-extras` | `localai/localai:latest-gpu-intel-f16-extras` |
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16-ffmpeg` | `localai/localai:{{< version >}}-sycl-f16-ffmpeg` | | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16` | `localai/localai:{{< version >}}-sycl-f16` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16-ffmpeg-core` | `localai/localai:{{< version >}}-sycl-f16-ffmpeg-core` |
{{% /tab %}} {{% /tab %}}
@ -209,9 +204,8 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| --- | --- |-------------------------------------------------------------| | --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-sycl-f32` | `localai/localai:master-sycl-f32` | | Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-sycl-f32` | `localai/localai:master-sycl-f32` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-intel-f32` | `localai/localai:latest-gpu-intel-f32` | | Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-intel-f32` | `localai/localai:latest-gpu-intel-f32` |
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f32-core` | `localai/localai:{{< version >}}-sycl-f32-core` | | Latest tag with extras | `quay.io/go-skynet/local-ai:latest-gpu-intel-f32-extras` | `localai/localai:latest-gpu-intel-f32-extras` |
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f32-ffmpeg` | `localai/localai:{{< version >}}-sycl-f32-ffmpeg` | | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f32` | `localai/localai:{{< version >}}-sycl-f32` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f32-ffmpeg-core` | `localai/localai:{{< version >}}-sycl-f32-ffmpeg-core` |
{{% /tab %}} {{% /tab %}}
@ -220,20 +214,18 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| Description | Quay | Docker Hub | | Description | Quay | Docker Hub |
| --- | --- |-------------------------------------------------------------| | --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-hipblas` | `localai/localai:master-hipblas` | | Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-hipblas` | `localai/localai:master-hipblas` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-hipblas` | `localai/localai:latest-gpu-hipblas` | | Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-hipblas` | `localai/localai:latest-gpu-hipblas` |
| Latest tag with extras | `quay.io/go-skynet/local-ai:latest-gpu-hipblas-extras` | `localai/localai:latest-gpu-hipblas-extras` |
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-hipblas` | `localai/localai:{{< version >}}-hipblas` | | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-hipblas` | `localai/localai:{{< version >}}-hipblas` |
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-hipblas-ffmpeg` | `localai/localai:{{< version >}}-hipblas-ffmpeg` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-hipblas-ffmpeg-core` | `localai/localai:{{< version >}}-hipblas-ffmpeg-core` |
{{% /tab %}} {{% /tab %}}
{{% tab tabName="Vulkan Images" %}} {{% tab tabName="Vulkan Images" %}}
| Description | Quay | Docker Hub | | Description | Quay | Docker Hub |
| --- | --- |-------------------------------------------------------------| | --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai: master-vulkan-ffmpeg-core ` | `localai/localai: master-vulkan-ffmpeg-core ` | | Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-vulkan` | `localai/localai:master-vulkan` |
| Latest tag | `quay.io/go-skynet/local-ai: latest-vulkan-ffmpeg-core ` | `localai/localai: latest-vulkan-ffmpeg-core` | | Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-vulkan` | `localai/localai:latest-gpu-vulkan` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-vulkan-fmpeg-core` | `localai/localai:{{< version >}}-vulkan-fmpeg-core` | | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-vulkan` | `localai/localai:{{< version >}}-vulkan` |
{{% /tab %}} {{% /tab %}}
{{% tab tabName="Nvidia Linux for tegra" %}} {{% tab tabName="Nvidia Linux for tegra" %}}
@ -242,9 +234,9 @@ These images are compatible with Nvidia ARM64 devices, such as the Jetson Nano,
| Description | Quay | Docker Hub | | Description | Quay | Docker Hub |
| --- | --- |-------------------------------------------------------------| | --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core` | `localai/localai:master-nvidia-l4t-arm64-core` | | Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64` | `localai/localai:master-nvidia-l4t-arm64` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-core` | `localai/localai:latest-nvidia-l4t-arm64-core` | | Latest tag | `quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64` | `localai/localai:latest-nvidia-l4t-arm64` |
| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-nvidia-l4t-arm64-core` | `localai/localai:{{< version >}}-nvidia-l4t-arm64-core` | | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-nvidia-l4t-arm64` | `localai/localai:{{< version >}}-nvidia-l4t-arm64` |
{{% /tab %}} {{% /tab %}}

View file

@ -7,7 +7,7 @@ ico = "rocket_launch"
+++ +++
For installing LocalAI in Kubernetes, the deployment file from the `examples` can be used and customized as prefered: For installing LocalAI in Kubernetes, the deployment file from the `examples` can be used and customized as preferred:
``` ```
kubectl apply -f https://raw.githubusercontent.com/mudler/LocalAI-examples/refs/heads/main/kubernetes/deployment.yaml kubectl apply -f https://raw.githubusercontent.com/mudler/LocalAI-examples/refs/heads/main/kubernetes/deployment.yaml
@ -29,7 +29,7 @@ helm repo update
# Get the values # Get the values
helm show values go-skynet/local-ai > values.yaml helm show values go-skynet/local-ai > values.yaml
# Edit the values value if needed # Edit the values if needed
# vim values.yaml ... # vim values.yaml ...
# Install the helm chart # Install the helm chart

View file

@ -14,18 +14,19 @@ icon = "rocket_launch"
If you are exposing LocalAI remotely, make sure you protect the API endpoints adequately with a mechanism which allows to protect from the incoming traffic or alternatively, run LocalAI with `API_KEY` to gate the access with an API key. The API key guarantees a total access to the features (there is no role separation), and it is to be considered as likely as an admin role. If you are exposing LocalAI remotely, make sure you protect the API endpoints adequately with a mechanism which allows to protect from the incoming traffic or alternatively, run LocalAI with `API_KEY` to gate the access with an API key. The API key guarantees a total access to the features (there is no role separation), and it is to be considered as likely as an admin role.
To access the WebUI with an API_KEY, browser extensions such as [Requestly](https://requestly.com/) can be used (see also https://github.com/mudler/LocalAI/issues/2227#issuecomment-2093333752). See also [API flags]({{% relref "docs/advanced/advanced-usage#api-flags" %}}) for the flags / options available when starting LocalAI.
{{% /alert %}} {{% /alert %}}
## Quickstart ## Quickstart
### Using the Bash Installer ### Using the Bash Installer
```bash ```bash
# Basic installation
curl https://localai.io/install.sh | sh curl https://localai.io/install.sh | sh
``` ```
See [Installer]({{% relref "docs/advanced/installer" %}}) for all the supported options
### Run with docker: ### Run with docker:
```bash ```bash
# CPU only image: # CPU only image:
@ -100,6 +101,57 @@ The AIO images come pre-configured with the following features:
For instructions on using AIO images, see [Using container images]({{% relref "docs/getting-started/container-images#all-in-one-images" %}}). For instructions on using AIO images, see [Using container images]({{% relref "docs/getting-started/container-images#all-in-one-images" %}}).
## Using LocalAI and the full stack with LocalAGI
LocalAI is part of the Local family stack, along with LocalAGI and LocalRecall.
[LocalAGI](https://github.com/mudler/LocalAGI) is a powerful, self-hostable AI Agent platform designed for maximum privacy and flexibility which encompassess and uses all the softwre stack. It provides a complete drop-in replacement for OpenAI's Responses APIs with advanced agentic capabilities, working entirely locally on consumer-grade hardware (CPU and GPU).
### Quick Start
```bash
# Clone the repository
git clone https://github.com/mudler/LocalAGI
cd LocalAGI
# CPU setup (default)
docker compose up
# NVIDIA GPU setup
docker compose -f docker-compose.nvidia.yaml up
# Intel GPU setup (for Intel Arc and integrated GPUs)
docker compose -f docker-compose.intel.yaml up
# Start with a specific model (see available models in models.localai.io, or localai.io to use any model in huggingface)
MODEL_NAME=gemma-3-12b-it docker compose up
# NVIDIA GPU setup with custom multimodal and image models
MODEL_NAME=gemma-3-12b-it \
MULTIMODAL_MODEL=minicpm-v-2_6 \
IMAGE_MODEL=flux.1-dev-ggml \
docker compose -f docker-compose.nvidia.yaml up
```
### Key Features
- **Privacy-Focused**: All processing happens locally, ensuring your data never leaves your machine
- **Flexible Deployment**: Supports CPU, NVIDIA GPU, and Intel GPU configurations
- **Multiple Model Support**: Compatible with various models from Hugging Face and other sources
- **Web Interface**: User-friendly chat interface for interacting with AI agents
- **Advanced Capabilities**: Supports multimodal models, image generation, and more
- **Docker Integration**: Easy deployment using Docker Compose
### Environment Variables
You can customize your LocalAGI setup using the following environment variables:
- `MODEL_NAME`: Specify the model to use (e.g., `gemma-3-12b-it`)
- `MULTIMODAL_MODEL`: Set a custom multimodal model
- `IMAGE_MODEL`: Configure an image generation model
For more advanced configuration and API documentation, visit the [LocalAGI GitHub repository](https://github.com/mudler/LocalAGI).
## What's Next? ## What's Next?
There is much more to explore with LocalAI! You can run any model from Hugging Face, perform video generation, and also voice cloning. For a comprehensive overview, check out the [features]({{% relref "docs/features" %}}) section. There is much more to explore with LocalAI! You can run any model from Hugging Face, perform video generation, and also voice cloning. For a comprehensive overview, check out the [features]({{% relref "docs/features" %}}) section.

View file

@ -1,3 +1,3 @@
{ {
"version": "v2.27.0" "version": "v2.29.0"
} }

Some files were not shown because too many files have changed in this diff Show more