mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-20 18:45:00 +00:00
feat: Return OpenAI errors and update docs (#80)
Signed-off-by: mudler <mudler@mocaccino.org>
This commit is contained in:
parent
045412e8dd
commit
12d83a4184
3 changed files with 74 additions and 5 deletions
52
README.md
52
README.md
|
@ -70,6 +70,42 @@ curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Example: Use GPT4ALL-J model
|
||||||
|
|
||||||
|
<details>
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone LocalAI
|
||||||
|
git clone https://github.com/go-skynet/LocalAI
|
||||||
|
|
||||||
|
cd LocalAI
|
||||||
|
|
||||||
|
# Download gpt4all-j to models/
|
||||||
|
wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
|
||||||
|
|
||||||
|
# Use a template from the examples
|
||||||
|
cp -rf prompt-templates/ggml-gpt4all-j.tmpl models/
|
||||||
|
|
||||||
|
# (optional) Edit the .env file to set things like context size and threads
|
||||||
|
# vim .env
|
||||||
|
|
||||||
|
# start with docker-compose
|
||||||
|
docker-compose up -d --build
|
||||||
|
|
||||||
|
# Now API is accessible at localhost:8080
|
||||||
|
curl http://localhost:8080/v1/models
|
||||||
|
# {"object":"list","data":[{"id":"ggml-gpt4all-j","object":"model"}]}
|
||||||
|
|
||||||
|
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||||
|
"model": "ggml-gpt4all-j",
|
||||||
|
"messages": [{"role": "user", "content": "How are you?"}],
|
||||||
|
"temperature": 0.9
|
||||||
|
}'
|
||||||
|
|
||||||
|
# {"model":"ggml-gpt4all-j","choices":[{"message":{"role":"assistant","content":"I'm doing well, thanks. How about you?"}}]}
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
## Prompt templates
|
## Prompt templates
|
||||||
|
|
||||||
The API doesn't inject a default prompt for talking to the model. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release.
|
The API doesn't inject a default prompt for talking to the model. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release.
|
||||||
|
@ -127,6 +163,7 @@ The API takes takes the following parameters:
|
||||||
| threads | THREADS | Number of Physical cores | The number of threads to use for text generation. |
|
| threads | THREADS | Number of Physical cores | The number of threads to use for text generation. |
|
||||||
| address | ADDRESS | :8080 | The address and port to listen on. |
|
| address | ADDRESS | :8080 | The address and port to listen on. |
|
||||||
| context-size | CONTEXT_SIZE | 512 | Default token context size. |
|
| context-size | CONTEXT_SIZE | 512 | Default token context size. |
|
||||||
|
| debug | DEBUG | false | Enable debug mode. |
|
||||||
|
|
||||||
Once the server is running, you can start making requests to it using HTTP, using the OpenAI API.
|
Once the server is running, you can start making requests to it using HTTP, using the OpenAI API.
|
||||||
|
|
||||||
|
@ -138,8 +175,14 @@ You can check out the [OpenAI API reference](https://platform.openai.com/docs/ap
|
||||||
|
|
||||||
Following the list of endpoints/parameters supported.
|
Following the list of endpoints/parameters supported.
|
||||||
|
|
||||||
|
Note:
|
||||||
|
|
||||||
|
- You can also specify the model a part of the OpenAI token.
|
||||||
|
- If only one model is available, the API will use it for all the requests.
|
||||||
|
|
||||||
#### Chat completions
|
#### Chat completions
|
||||||
|
|
||||||
|
<details>
|
||||||
For example, to generate a chat completion, you can send a POST request to the `/v1/chat/completions` endpoint with the instruction as the request body:
|
For example, to generate a chat completion, you can send a POST request to the `/v1/chat/completions` endpoint with the instruction as the request body:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -151,10 +194,12 @@ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/jso
|
||||||
```
|
```
|
||||||
|
|
||||||
Available additional parameters: `top_p`, `top_k`, `max_tokens`
|
Available additional parameters: `top_p`, `top_k`, `max_tokens`
|
||||||
|
</details>
|
||||||
|
|
||||||
#### Completions
|
#### Completions
|
||||||
|
|
||||||
For example, to generate a comletion, you can send a POST request to the `/v1/completions` endpoint with the instruction as the request body:
|
<details>
|
||||||
|
For example, to generate a completion, you can send a POST request to the `/v1/completions` endpoint with the instruction as the request body:
|
||||||
```
|
```
|
||||||
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
|
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
|
||||||
"model": "ggml-koala-7b-model-q4_0-r2.bin",
|
"model": "ggml-koala-7b-model-q4_0-r2.bin",
|
||||||
|
@ -165,14 +210,19 @@ curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d
|
||||||
|
|
||||||
Available additional parameters: `top_p`, `top_k`, `max_tokens`
|
Available additional parameters: `top_p`, `top_k`, `max_tokens`
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
#### List models
|
#### List models
|
||||||
|
|
||||||
|
<details>
|
||||||
You can list all the models available with:
|
You can list all the models available with:
|
||||||
|
|
||||||
```
|
```
|
||||||
curl http://localhost:8080/v1/models
|
curl http://localhost:8080/v1/models
|
||||||
```
|
```
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
## Using other models
|
## Using other models
|
||||||
|
|
||||||
gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted (same applies for old alpaca models, too):
|
gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted (same applies for old alpaca models, too):
|
||||||
|
|
20
api/api.go
20
api/api.go
|
@ -18,6 +18,18 @@ import (
|
||||||
"github.com/rs/zerolog/log"
|
"github.com/rs/zerolog/log"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// APIError provides error information returned by the OpenAI API.
|
||||||
|
type APIError struct {
|
||||||
|
Code any `json:"code,omitempty"`
|
||||||
|
Message string `json:"message"`
|
||||||
|
Param *string `json:"param,omitempty"`
|
||||||
|
Type string `json:"type"`
|
||||||
|
}
|
||||||
|
|
||||||
|
type ErrorResponse struct {
|
||||||
|
Error *APIError `json:"error,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
type OpenAIResponse struct {
|
type OpenAIResponse struct {
|
||||||
Created int `json:"created,omitempty"`
|
Created int `json:"created,omitempty"`
|
||||||
Object string `json:"chat.completion,omitempty"`
|
Object string `json:"chat.completion,omitempty"`
|
||||||
|
@ -395,9 +407,11 @@ func App(loader *model.ModelLoader, threads, ctxSize int, f16 bool, debug, disab
|
||||||
}
|
}
|
||||||
|
|
||||||
// Send custom error page
|
// Send custom error page
|
||||||
return ctx.Status(code).JSON(struct {
|
return ctx.Status(code).JSON(
|
||||||
Error string `json:"error"`
|
ErrorResponse{
|
||||||
}{Error: err.Error()})
|
Error: &APIError{Message: err.Error(), Code: code},
|
||||||
|
},
|
||||||
|
)
|
||||||
},
|
},
|
||||||
})
|
})
|
||||||
|
|
||||||
|
|
|
@ -49,5 +49,10 @@ var _ = Describe("API test", func() {
|
||||||
Expect(len(resp.Choices)).To(Equal(1))
|
Expect(len(resp.Choices)).To(Equal(1))
|
||||||
Expect(resp.Choices[0].Text).ToNot(BeEmpty())
|
Expect(resp.Choices[0].Text).ToNot(BeEmpty())
|
||||||
})
|
})
|
||||||
|
It("returns errors", func() {
|
||||||
|
_, err := client.CreateCompletion(context.TODO(), openai.CompletionRequest{Model: "foomodel", Prompt: "abcdedfghikl"})
|
||||||
|
Expect(err).To(HaveOccurred())
|
||||||
|
Expect(err.Error()).To(ContainSubstring("error, status code: 500, message: llama: model does not exist"))
|
||||||
|
})
|
||||||
})
|
})
|
||||||
})
|
})
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue