* feat: Add backend gallery
This PR add support to manage backends as similar to models. There is
now available a backend gallery which can be used to install and remove
extra backends.
The backend gallery can be configured similarly as a model gallery, and
API calls allows to install and remove new backends in runtime, and as
well during the startup phase of LocalAI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add backends docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip: Backend Dockerfile for python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: drop extras images, build python backends separately
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup on all backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tweaks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop old backends leftovers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move dockerfile upper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix proto
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Feature dropped for consistency - we prefer model galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add missing packages in the build image
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* exllama is ponly available on cublas
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* pin torch on chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups to index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug CI
* Install accellerators deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add target arch
* Add cuda minor version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use self-hosted runners
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: use quay for test images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups for vllm and chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups on CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chatterbox is only available for nvidia
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify CI builds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt test, use qwen3
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(model gallery): add jina-reranker-v1-tiny-en-gguf
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(gguf-parser): recover from potential panics that can happen while reading ggufs with gguf-parser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use reranker from llama.cpp in AIO images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Limit concurrent jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
* feat(realtime): Initial Realtime API implementation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: go mod tidy
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: Implement transcription only mode for realtime API
Reduce the scope of the real time API for the initial realease and make
transcription only mode functional.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(build): Build backends on a separate layer to speed up core only changes
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This PR changes entirely the UI look and feeling. It updates all
sections and makes it also mobile-ready.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* squash past, centralize request middleware PR
Signed-off-by: Dave Lee <dave@gray101.com>
* migrate bruno request files to examples repo
Signed-off-by: Dave Lee <dave@gray101.com>
* fix
Signed-off-by: Dave Lee <dave@gray101.com>
* Update tests/e2e-aio/e2e_test.go
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Add machine tag option, add extraUsage option, grpc-server -> proto -> endpoint extraUsage data is broken for now
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
* remove redurant timing fields, fix not working timings output
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
* use middleware for Machine-Tag only if tag is specified
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
---------
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
Makes the web app honour the `X-Forwarded-Prefix` HTTP request header that may be sent by a reverse-proxy in order to inform the app that its public routes contain a path prefix.
For instance this allows to serve the webapp via a reverse-proxy/ingress controller under a path prefix/sub path such as e.g. `/localai/` while still being able to use the regular LocalAI routes/paths without prefix when directly connecting to the LocalAI server.
Changes:
* Add new `StripPathPrefix` middleware to strip the path prefix (provided with the `X-Forwarded-Prefix` HTTP request header) from the request path prior to matching the HTTP route.
* Add a `BaseURL` utility function to build the base URL, honouring the `X-Forwarded-Prefix` HTTP request header.
* Generate the derived base URL into the HTML (`head.html` template) as `<base/>` tag.
* Make all webapp-internal URLs (within HTML+JS) relative in order to make the browser resolve them against the `<base/>` URL specified within each HTML page's header.
* Make font URLs within the CSS files relative to the CSS file.
* Generate redirect location URLs using the new `BaseURL` function.
* Use the new `BaseURL` function to generate absolute URLs within gallery JSON responses.
Closes#3095
TL;DR:
The header-based approach allows to move the path prefix configuration concern completely to the reverse-proxy/ingress as opposed to having to align the path prefix configuration between LocalAI, the reverse-proxy and potentially other internal LocalAI clients.
The gofiber swagger handler already supports path prefixes this way, see e2d9e9916d/swagger.go (L79)
Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
* Read jinja templates as fallback
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move templating out of model loader
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Test TemplateMessages
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Set role and content from transformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tests: be more flexible
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* More jinja
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small refactoring and adaptations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Seem the "/metrics" endpoint that is source of confusion as people tends
to believe we collect telemetry data just because we import
"opentelemetry", however it is still a good idea to allow to disable
even local metrics if not really required.
See also: https://github.com/mudler/LocalAI/issues/3942
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(health): do not require auth for /healthz and /readyz
Fixes: #3655
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Comment so I don’t forget
Adding a reminder here...
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
* add api key to existing app tests, add preliminary auth test
Signed-off-by: Dave Lee <dave@gray101.com>
* small fix, run test
Signed-off-by: Dave Lee <dave@gray101.com>
* status on non-opaque
Signed-off-by: Dave Lee <dave@gray101.com>
* tweak auth error
Signed-off-by: Dave Lee <dave@gray101.com>
* exp
Signed-off-by: Dave Lee <dave@gray101.com>
* quick fix on real laptop
Signed-off-by: Dave Lee <dave@gray101.com>
* add downloader version that allows providing an auth header
Signed-off-by: Dave Lee <dave@gray101.com>
* stash some devcontainer fixes during testing
Signed-off-by: Dave Lee <dave@gray101.com>
* s2
Signed-off-by: Dave Lee <dave@gray101.com>
* s
Signed-off-by: Dave Lee <dave@gray101.com>
* done with experiment
Signed-off-by: Dave Lee <dave@gray101.com>
* done with experiment
Signed-off-by: Dave Lee <dave@gray101.com>
* after merge fix
Signed-off-by: Dave Lee <dave@gray101.com>
* rename and fix
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* start breaking up the giant channel refactor now that it's better understood - easier to merge bites
Signed-off-by: Dave Lee <dave@gray101.com>
* add concurrency and base64 back in, along with new base64 tests.
Signed-off-by: Dave Lee <dave@gray101.com>
* Automatic rename of whisper.go's Result to TranscriptResult
Signed-off-by: Dave Lee <dave@gray101.com>
* remove pkg/concurrency - significant changes coming in split 2
Signed-off-by: Dave Lee <dave@gray101.com>
* fix comments
Signed-off-by: Dave Lee <dave@gray101.com>
* add list_model service as another low-risk service to get it out of the way
Signed-off-by: Dave Lee <dave@gray101.com>
* split backend config loader into seperate file from the actual config struct. No changes yet, just reduce cognative load with smaller files of logical blocks
Signed-off-by: Dave Lee <dave@gray101.com>
* rename state.go ==> application.go
Signed-off-by: Dave Lee <dave@gray101.com>
* fix lost import?
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>