LocalAI/backend
fakezeta 3882130911
feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 (#1823)
* fixes #1775 and #1774

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

* Manage 4bit and 8 bit quantization

Manage different BitsAndBytes options with the quantization: parameter in yaml

* fix compilation errors on non CUDA environment
2024-03-14 23:06:30 +01:00
..
cpp fix: the correct BUILD_TYPE for OpenCL is clblas (with no t) (#1828) 2024-03-14 08:39:21 +01:00
go Fix Command Injection Vulnerability (#1778) 2024-02-29 18:32:29 +00:00
python feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 (#1823) 2024-03-14 23:06:30 +01:00
backend.proto Bump vLLM version + more options when loading models in vLLM (#1782) 2024-03-01 22:48:53 +01:00
backend_grpc.pb.go transformers: correctly load automodels (#1643) 2024-01-26 00:13:21 +01:00