docs: Initial import from localai-website (#1312)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto 2023-11-22 18:13:50 +01:00 committed by GitHub
parent 763f94ca80
commit c5c77d2b0d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
66 changed files with 6111 additions and 0 deletions

View file

@ -0,0 +1,42 @@
+++
disableToc = false
title = "🦙 Exllama"
weight = 2
+++
[Exllama](https://github.com/turboderp/exllama) is a "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights"
## Prerequisites
This is an extra backend - in the container images is already available and there is nothing to do for the setup.
If you are building LocalAI locally, you need to install [exllama manually](https://github.com/jllllll/exllama#this-is-a-python-module-version-of-exllama) first.
## Model setup
Download the model as a folder inside the `model ` directory and create a YAML file specifying the `exllama` backend. For instance with the `TheBloke/WizardLM-7B-uncensored-GPTQ` model:
```
$ git lfs install
$ cd models && git clone https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ
$ ls models/
.keep WizardLM-7B-uncensored-GPTQ/ exllama.yaml
$ cat models/exllama.yaml
name: exllama
parameters:
model: WizardLM-7B-uncensored-GPTQ
backend: exllama
# ...
```
Test with:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "exllama",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.1
}'
```