mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-20 02:24:59 +00:00
chore(docs): extra-Usage and Machine-Tag docs (#4627)
Rename LocalAI-Extra-Usage -> Extra-Usage, add MACHINE_TAG as cli flag option, add docs about extra-usage and machine-tag Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
This commit is contained in:
parent
895cd7c76a
commit
96306a39a0
5 changed files with 34 additions and 5 deletions
|
@ -520,6 +520,7 @@ In the help text below, BASEPATH is the location that local-ai is being executed
|
|||
| --upload-limit | 15 | Default upload-limit in MB | $LOCALAI_UPLOAD_LIMIT |
|
||||
| --api-keys | API-KEYS,... | List of API Keys to enable API authentication. When this is set, all the requests must be authenticated with one of these API keys | $LOCALAI_API_KEY |
|
||||
| --disable-welcome | | Disable welcome pages | $LOCALAI_DISABLE_WELCOME |
|
||||
| --machine-tag | | If not empty - put that string to Machine-Tag header in each response. Useful to track response from different machines using multiple P2P federated nodes | $LOCALAI_MACHINE_TAG |
|
||||
|
||||
#### Backend Flags
|
||||
| Parameter | Default | Description | Environment Variable |
|
||||
|
@ -553,6 +554,34 @@ LOCALAI_MODELS_PATH=/mnt/storage/localai/models
|
|||
LOCALAI_F16=true
|
||||
```
|
||||
|
||||
### Request headers
|
||||
|
||||
You can use 'Extra-Usage' request header key presence ('Extra-Usage: true') to receive inference timings in milliseconds extending default OpenAI response model in the usage field:
|
||||
```
|
||||
...
|
||||
{
|
||||
"id": "...",
|
||||
"created": ...,
|
||||
"model": "...",
|
||||
"choices": [
|
||||
{
|
||||
...
|
||||
},
|
||||
...
|
||||
],
|
||||
"object": "...",
|
||||
"usage": {
|
||||
"prompt_tokens": ...,
|
||||
"completion_tokens": ...,
|
||||
"total_tokens": ...,
|
||||
// Extra-Usage header key will include these two float fields:
|
||||
"timing_prompt_processing: ...,
|
||||
"timing_token_generation": ...,
|
||||
},
|
||||
}
|
||||
...
|
||||
```
|
||||
|
||||
### Extra backends
|
||||
|
||||
LocalAI can be extended with extra backends. The backends are implemented as `gRPC` services and can be written in any language. The container images that are built and published on [quay.io](https://quay.io/repository/go-skynet/local-ai?tab=tags) contain a set of images split in core and extra. By default Images bring all the dependencies and backends supported by LocalAI (we call those `extra` images). The `-core` images instead bring only the strictly necessary dependencies to run LocalAI without only a core set of backends.
|
||||
|
@ -616,4 +645,4 @@ Note that, for llama.cpp you need to set accordingly `LLAMACPP_PARALLEL` to the
|
|||
|
||||
LocalAI will automatically discover the CPU flagset available in your host and will use the most optimized version of the backends.
|
||||
|
||||
If you want to disable this behavior, you can set `DISABLE_AUTODETECT` to `true` in the environment variables.
|
||||
If you want to disable this behavior, you can set `DISABLE_AUTODETECT` to `true` in the environment variables.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue