🔥 add LaVA support and GPT vision API, Multiple requests for llama.cpp, return JSON types (#1254)

* wip

* wip

* Make it functional

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* wip

* Small fixups

* do not inject space on role encoding, encode img at beginning of messages

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add examples/config defaults

* Add include dir of current source dir

* cleanup

* fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

* Revert "fixups"

This reverts commit f1a4731cca.

* fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto 2023-11-11 13:14:59 +01:00 committed by GitHub
parent 3b4c5d54d8
commit 0eae727366
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
28 changed files with 26908 additions and 864 deletions

View file

@ -0,0 +1,18 @@
![llava](https://github.com/mudler/LocalAI/assets/2420543/cb0a0897-3b58-4350-af66-e6f4387b58d3)
## Setup
```
mkdir models
wget https://huggingface.co/mys/ggml_bakllava-1/resolve/main/ggml-model-q4_k.gguf -O models/ggml-model-q4_k.gguf
wget https://huggingface.co/mys/ggml_bakllava-1/resolve/main/mmproj-model-f16.gguf -O models/mmproj-model-f16.gguf
docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:master --models-path /models --threads 4
```
## Try it out
```
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llava",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
```

View file

@ -0,0 +1,3 @@
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
{{.Input}}
ASSISTANT:

View file

@ -0,0 +1,20 @@
context_size: 4096
f16: true
threads: 11
gpu_layers: 90
name: llava
mmap: true
backend: llama-cpp
roles:
user: "USER:"
assistant: "ASSISTANT:"
system: "SYSTEM:"
parameters:
model: ggml-model-q4_k.gguf
temperature: 0.2
top_k: 40
top_p: 0.95
template:
chat: chat-simple
mmproj: mmproj-model-f16.gguf