feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 (#1823)

* fixes #1775 and #1774

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

* Manage 4bit and 8 bit quantization

Manage different BitsAndBytes options with the quantization: parameter in yaml

* fix compilation errors on non CUDA environment
This commit is contained in:
fakezeta 2024-03-14 23:06:30 +01:00 committed by GitHub
parent a6b540737f
commit 3882130911
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 49 additions and 23 deletions

View file

@ -30,6 +30,7 @@ dependencies:
- async-timeout==4.0.3
- attrs==23.1.0
- bark==0.1.5
- bitsandbytes==0.43.0
- boto3==1.28.61
- botocore==1.31.61
- certifi==2023.7.22