From 844c0c422d4cbe2dd7b3f9b4667e6c239c9e33f6 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 11 Jan 2025 22:10:45 +0100
Subject: [PATCH 001/679] docs: :arrow_up: update docs version mudler/LocalAI
(#4578)
:arrow_up: Update docs version mudler/LocalAI
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
docs/data/version.json | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/data/version.json b/docs/data/version.json
index bf065426..0044f3a2 100644
--- a/docs/data/version.json
+++ b/docs/data/version.json
@@ -1,3 +1,3 @@
{
- "version": "v2.24.2"
+ "version": "v2.25.0"
}
From 80dc23fab9073e4f2446b1ef9023536ef7413b2f Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 11 Jan 2025 22:23:10 +0100
Subject: [PATCH 002/679] chore(model-gallery): :arrow_up: update checksum
(#4580)
:arrow_up: Checksum updates in gallery/index.yaml
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
gallery/index.yaml | 17 +++++------------
1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index f20be17e..4cb6ccbd 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -14,15 +14,15 @@
- https://huggingface.co/microsoft/phi-4
- https://huggingface.co/bartowski/phi-4-GGUF
description: |
- phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.
- phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. Phi-4 is a 14B parameters, dense decoder-only Transformer model.
+ phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.
+ phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. Phi-4 is a 14B parameters, dense decoder-only Transformer model.
overrides:
parameters:
model: phi-4-Q4_K_M.gguf
files:
- filename: phi-4-Q4_K_M.gguf
- sha256: e38bd5fa5f1c03d51ebc34a8d7b284e0da089c8af05e7f409a0079a9c831a10b
uri: huggingface://bartowski/phi-4-GGUF/phi-4-Q4_K_M.gguf
+ sha256: 009aba717c09d4a35890c7d35eb59d54e1dba884c7c526e7197d9c13ab5911d9
- &falcon3
name: "falcon3-1b-instruct"
url: "github:mudler/LocalAI/gallery/falcon3.yaml@master"
@@ -2726,14 +2726,7 @@
urls:
- https://huggingface.co/Krystalan/DRT-o1-7B
- https://huggingface.co/QuantFactory/DRT-o1-7B-GGUF
- description: |
- In this work, we introduce DRT-o1, an attempt to bring the success of long thought reasoning to neural machine translation (MT). To this end,
-
- š We mine English sentences with similes or metaphors from existing literature books, which are suitable for translation via long thought.
- š We propose a designed multi-agent framework with three agents (i.e., a translator, an advisor and an evaluator) to synthesize the MT samples with long thought. There are 22,264 synthesized samples in total.
- š We train DRT-o1-8B, DRT-o1-7B and DRT-o1-14B using Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct as backbones.
-
- Our goal is not to achieve competitive performance with OpenAIās O1 in neural machine translation (MT). Instead, we explore technical routes to bring the success of long thought to MT. To this end, we introduce DRT-o1, a byproduct of our exploration, and we hope it could facilitate the corresponding research in this direction.
+ description: "In this work, we introduce DRT-o1, an attempt to bring the success of long thought reasoning to neural machine translation (MT). To this end,\n\n\U0001F31F We mine English sentences with similes or metaphors from existing literature books, which are suitable for translation via long thought.\n\U0001F31F We propose a designed multi-agent framework with three agents (i.e., a translator, an advisor and an evaluator) to synthesize the MT samples with long thought. There are 22,264 synthesized samples in total.\n\U0001F31F We train DRT-o1-8B, DRT-o1-7B and DRT-o1-14B using Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct as backbones.\n\nOur goal is not to achieve competitive performance with OpenAIās O1 in neural machine translation (MT). Instead, we explore technical routes to bring the success of long thought to MT. To this end, we introduce DRT-o1, a byproduct of our exploration, and we hope it could facilitate the corresponding research in this direction.\n"
overrides:
parameters:
model: DRT-o1-7B.Q4_K_M.gguf
@@ -5874,7 +5867,7 @@
- https://huggingface.co/Nitral-AI/Nera_Noctis-12B
- https://huggingface.co/bartowski/Nera_Noctis-12B-GGUF
description: |
- Sometimes, the brightest gems are found in the darkest places. For it is in the shadows where we learn to really see the light.
+ Sometimes, the brightest gems are found in the darkest places. For it is in the shadows where we learn to really see the light.
overrides:
parameters:
model: Nera_Noctis-12B-Q4_K_M.gguf
From b206eab80f6bad968ec307cef18a1d5b39982be9 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 11 Jan 2025 22:41:30 +0100
Subject: [PATCH 003/679] chore(model gallery): add nightwing3-10b-v0.1 (#4582)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 4cb6ccbd..82cd1dc5 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -132,6 +132,22 @@
- filename: Falcon3-7B-Instruct-abliterated-Q4_K_M.gguf
sha256: 68e10e638668acaa49fb7919224c7d8bcf1798126c7a499c4d9ec3b81313f8c8
uri: huggingface://bartowski/Falcon3-7B-Instruct-abliterated-GGUF/Falcon3-7B-Instruct-abliterated-Q4_K_M.gguf
+- !!merge <<: *falcon3
+ name: "nightwing3-10b-v0.1"
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/642265bc01c62c1e4102dc36/C6gY9vxCl3_SFzQLpLG0S.png
+ urls:
+ - https://huggingface.co/Nitral-AI/NightWing3-10B-v0.1
+ - https://huggingface.co/bartowski/NightWing3-10B-v0.1-GGUF
+ description: |
+ Base model: (Falcon3-10B)
+ overrides:
+ parameters:
+ model: NightWing3-10B-v0.1-Q4_K_M.gguf
+ files:
+ - filename: NightWing3-10B-v0.1-Q4_K_M.gguf
+ sha256: 2e87671542d22fe1ef9a68e43f2fdab7c2759479ad531946d9f0bdeffa6f5747
+ uri: huggingface://bartowski/NightWing3-10B-v0.1-GGUF/NightWing3-10B-v0.1-Q4_K_M.gguf
- &intellect1
name: "intellect-1-instruct"
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
From cb8bf79adab6cc658b547c79e29ecc3a221beba9 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 11 Jan 2025 22:45:37 +0100
Subject: [PATCH 004/679] chore(model gallery): add qwq-32b-preview-ideawhiz-v1
(#4583)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 82cd1dc5..c6a9b624 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -2952,6 +2952,21 @@
- filename: Chuluun-Qwen2.5-72B-v0.01-Q4_K_M.gguf
sha256: 901d9d10aad42de3188e721accdc4eb0efec96cbca48563f802793dceaf551f5
uri: huggingface://bartowski/Chuluun-Qwen2.5-72B-v0.01-GGUF/Chuluun-Qwen2.5-72B-v0.01-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "qwq-32b-preview-ideawhiz-v1"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/6205fefd3f1dc8a642d70b10/JEZgA_xV6oF8AIsya9dop.jpeg
+ urls:
+ - https://huggingface.co/6cf/QwQ-32B-Preview-IdeaWhiz-v1
+ - https://huggingface.co/bartowski/QwQ-32B-Preview-IdeaWhiz-v1-GGUF
+ description: |
+ IdeaWhiz is a fine-tuned version of QwQ-32B-Preview, specifically optimized for scientific creativity and step-by-step reasoning. The model leverages the LiveIdeaBench dataset to enhance its capabilities in generating novel scientific ideas and hypotheses.
+ overrides:
+ parameters:
+ model: QwQ-32B-Preview-IdeaWhiz-v1-Q4_K_M.gguf
+ files:
+ - filename: QwQ-32B-Preview-IdeaWhiz-v1-Q4_K_M.gguf
+ sha256: 1648e13d9974b10d08ee45f48fd3ebd15cf67745fe20d602f9306fe0253b6a96
+ uri: huggingface://bartowski/QwQ-32B-Preview-IdeaWhiz-v1-GGUF/QwQ-32B-Preview-IdeaWhiz-v1-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From cd480dbe5c04bb8e82da2f71586937916eb7a11f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 11 Jan 2025 23:24:55 +0100
Subject: [PATCH 005/679] chore(model gallery): add rombos-qwen2.5-writer-32b
(#4584)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index c6a9b624..fb4de112 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -2967,6 +2967,21 @@
- filename: QwQ-32B-Preview-IdeaWhiz-v1-Q4_K_M.gguf
sha256: 1648e13d9974b10d08ee45f48fd3ebd15cf67745fe20d602f9306fe0253b6a96
uri: huggingface://bartowski/QwQ-32B-Preview-IdeaWhiz-v1-GGUF/QwQ-32B-Preview-IdeaWhiz-v1-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "rombos-qwen2.5-writer-32b"
+ icon: https://huggingface.co/SubtleOne/Rombos-Qwen2.5-Writer-32b/blob/main/robot-creating-fantasy.jpg
+ urls:
+ - https://huggingface.co/SubtleOne/Rombos-Qwen2.5-Writer-32b
+ - https://huggingface.co/bartowski/Rombos-Qwen2.5-Writer-32b-GGUF
+ description: |
+ This model is a merge using Rombos's top-ranked 32b model, based on Qwen 2.5, and merging three creative writing finetunes. The creative content is a serious upgrade over the base it started with, and I enjoyed it in my DnD RPG campaign.
+ overrides:
+ parameters:
+ model: Rombos-Qwen2.5-Writer-32b-Q4_K_M.gguf
+ files:
+ - filename: Rombos-Qwen2.5-Writer-32b-Q4_K_M.gguf
+ sha256: cf0e48c6cb8b6f41834603900642b5395105980297709c85c4216bd44fac956a
+ uri: huggingface://bartowski/Rombos-Qwen2.5-Writer-32b-GGUF/Rombos-Qwen2.5-Writer-32b-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From 7cd33d10c93485e2a01efc298d111c18b4d9fd8e Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 11 Jan 2025 23:25:09 +0100
Subject: [PATCH 006/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`c05e8c9934f94fde49bc1bc9dc51eed282605150` (#4579)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index e81ec442..261f2833 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=ba8a1f9c5b675459c55a83e3f97f10df3a66c788
+CPPLLAMA_VERSION?=c05e8c9934f94fde49bc1bc9dc51eed282605150
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From b898cd49b58b2f930814fd4703065b9f92f4e3c1 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 12 Jan 2025 10:33:29 +0100
Subject: [PATCH 007/679] chore(model gallery): add sky-t1-32b-preview (#4585)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index fb4de112..15706d4c 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -2982,6 +2982,22 @@
- filename: Rombos-Qwen2.5-Writer-32b-Q4_K_M.gguf
sha256: cf0e48c6cb8b6f41834603900642b5395105980297709c85c4216bd44fac956a
uri: huggingface://bartowski/Rombos-Qwen2.5-Writer-32b-GGUF/Rombos-Qwen2.5-Writer-32b-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "sky-t1-32b-preview"
+ icon: https://raw.githubusercontent.com/NovaSky-AI/novasky-ai.github.io/main/assets/images/blue-bird-wider.jpeg
+ urls:
+ - https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview
+ - https://huggingface.co/bartowski/Sky-T1-32B-Preview-GGUF
+ - https://novasky-ai.github.io/posts/sky-t1/
+ description: |
+ This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding. Please see our blog post for more details.
+ overrides:
+ parameters:
+ model: Sky-T1-32B-Preview-Q4_K_M.gguf
+ files:
+ - filename: Sky-T1-32B-Preview-Q4_K_M.gguf
+ sha256: c735912a582f10e4769461586a02e5b98ef43c2895ec11923b8c4f157e7909e5
+ uri: huggingface://bartowski/Sky-T1-32B-Preview-GGUF/Sky-T1-32B-Preview-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From f8cffd05e5902a8452989e4ba66b4805a329b0ea Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 12 Jan 2025 10:36:01 +0100
Subject: [PATCH 008/679] chore(model gallery): add negative_llama_70b (#4586)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 15706d4c..35217913 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -311,6 +311,26 @@
- filename: 70B-L3.3-Cirrus-x1-Q4_K_M.gguf
sha256: 07dd464dddba959df8eb2f937787c2210b4c51c2375bd7c7ab2abbe198142a19
uri: huggingface://bartowski/70B-L3.3-Cirrus-x1-GGUF/70B-L3.3-Cirrus-x1-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "negative_llama_70b"
+ icon: https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B/resolve/main/Images/Negative_LLAMA_70B.png
+ urls:
+ - https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B
+ - https://huggingface.co/bartowski/Negative_LLAMA_70B-GGUF
+ description: |
+ - Strong Roleplay & Creative writing abilities.
+ - Less positivity bias.
+ - Very smart assistant with low refusals.
+ - Exceptionally good at following the character card.
+ - Characters feel more 'alive', and will occasionally initiate stuff on their own (without being prompted to, but fitting to their character).
+ - Strong ability to comprehend and roleplay uncommon physical and mental characteristics.
+ overrides:
+ parameters:
+ model: Negative_LLAMA_70B-Q4_K_M.gguf
+ files:
+ - filename: Negative_LLAMA_70B-Q4_K_M.gguf
+ sha256: 023c6bd38f6a66178529e6bb77b6e76379ae3ee031adc6885531986aa12750d9
+ uri: huggingface://bartowski/Negative_LLAMA_70B-GGUF/Negative_LLAMA_70B-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From 1780ccadbccccb79de1c88bd734e3ed38f8fefa6 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 12 Jan 2025 10:40:26 +0100
Subject: [PATCH 009/679] chore(model gallery): add finemath-llama-3b (#4587)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 35217913..7d60167b 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1211,6 +1211,24 @@
- filename: MiniThinky-v2-1B-Llama-3.2-Q4_K_M.gguf
sha256: 086857b6364afd757a123eea0474bede09f25608783e7a6fcf2f88d8cb322ca1
uri: huggingface://bartowski/MiniThinky-v2-1B-Llama-3.2-GGUF/MiniThinky-v2-1B-Llama-3.2-Q4_K_M.gguf
+- !!merge <<: *llama32
+ icon: https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/HZ6KOc8IVXXOABrdv0dyK.png
+ name: "finemath-llama-3b"
+ urls:
+ - https://huggingface.co/HuggingFaceTB/FineMath-Llama-3B
+ - https://huggingface.co/bartowski/FineMath-Llama-3B-GGUF
+ description: |
+ This is a continual-pre-training of Llama-3.2-3B on a mix of š FineMath (our new high quality math dataset) and FineWeb-Edu.
+
+ The model demonstrates superior math performance compared to Llama 3.2 3B, while maintaining similar performance on knowledge, reasoning, and common sense benchmarks.
+ It was trained on 160B tokens using a mix of 40% FineWeb-Edu and 60% from FineMath (30% FineMath-4+ subset and 30% InfiWebMath-4+ subset). We use nanotron for the training, and you can find the training scripts in our SmolLM2 GitHub repo.
+ overrides:
+ parameters:
+ model: FineMath-Llama-3B-Q4_K_M.gguf
+ files:
+ - filename: FineMath-Llama-3B-Q4_K_M.gguf
+ sha256: 16c73b5cf2a417a7e1608bcc9469f1461fc3e759ce04a3a337f48df977dc158c
+ uri: huggingface://bartowski/FineMath-Llama-3B-GGUF/FineMath-Llama-3B-Q4_K_M.gguf
- &qwen25
## Qwen2.5
name: "qwen2.5-14b-instruct"
From e8de7b52da29ec5ac4042f3bed71f1968fe2b973 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 12 Jan 2025 11:26:42 +0100
Subject: [PATCH 010/679] chore(model gallery): add
LocalAI-functioncall-phi-4-v0.1 (#4588)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
gallery/phi-4-chat-fcall.yaml | 27 +++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
create mode 100644 gallery/phi-4-chat-fcall.yaml
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 7d60167b..2b546c0b 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -23,6 +23,22 @@
- filename: phi-4-Q4_K_M.gguf
uri: huggingface://bartowski/phi-4-GGUF/phi-4-Q4_K_M.gguf
sha256: 009aba717c09d4a35890c7d35eb59d54e1dba884c7c526e7197d9c13ab5911d9
+- !!merge <<: *phi4
+ url: "github:mudler/LocalAI/gallery/phi-4-fcall.yaml@master"
+ name: "LocalAI-functioncall-phi-4-v0.1"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/Dzbdzn27KEc3K6zNNi070.png
+ description: |
+ A model tailored to be conversational and execute function calls with LocalAI. This model is based on phi-4.
+ urls:
+ - https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.1
+ - https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.1-Q4_K_M-GGUF
+ overrides:
+ parameters:
+ model: localai-functioncall-phi-4-v0.1-q4_k_m.gguf
+ files:
+ - filename: localai-functioncall-phi-4-v0.1-q4_k_m.gguf
+ uri: huggingface://mudler/LocalAI-functioncall-phi-4-v0.1-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.1-q4_k_m.gguf
+ sha256: 0ae4e5e4ba89c16c1e810285c5c8b84416fa67f8ed7c175aa0b6fc0a103017aa
- &falcon3
name: "falcon3-1b-instruct"
url: "github:mudler/LocalAI/gallery/falcon3.yaml@master"
diff --git a/gallery/phi-4-chat-fcall.yaml b/gallery/phi-4-chat-fcall.yaml
new file mode 100644
index 00000000..a6fa261e
--- /dev/null
+++ b/gallery/phi-4-chat-fcall.yaml
@@ -0,0 +1,27 @@
+---
+name: "phi-4-chat"
+
+config_file: |
+ mmap: true
+ template:
+ chat_message: |
+ <|im_start|>{{ .RoleName }}<|im_sep|>
+ {{.Content}}<|im_end|>
+ chat: |
+ {{.Input}}
+ <|im_start|>assistant<|im_sep|>
+ completion: |
+ {{.Input}}
+ function: |
+ <|im_start|>system<|im_sep|>
+ You are an AI assistant that executes function calls, and these are the tools at your disposal:
+ {{range .Functions}}
+ {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
+ {{end}}
+ {{.Input}}<|im_end|>
+ context_size: 4096
+ f16: true
+ stopwords:
+ - <|end|>
+ - <|endoftext|>
+ - <|im_end|>
From 9ce71fe427e8ee3e1e3a8b00b7d00d2725270138 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 12 Jan 2025 11:50:40 +0100
Subject: [PATCH 011/679] fix(gallery): correct UL typo
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 2b546c0b..9fc6f077 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -24,7 +24,7 @@
uri: huggingface://bartowski/phi-4-GGUF/phi-4-Q4_K_M.gguf
sha256: 009aba717c09d4a35890c7d35eb59d54e1dba884c7c526e7197d9c13ab5911d9
- !!merge <<: *phi4
- url: "github:mudler/LocalAI/gallery/phi-4-fcall.yaml@master"
+ url: "github:mudler/LocalAI/gallery/phi-4-chat-fcall.yaml@master"
name: "LocalAI-functioncall-phi-4-v0.1"
icon: https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/Dzbdzn27KEc3K6zNNi070.png
description: |
From 6a299c04a7e4a4e23188504bcb0e89488819ee1f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 12 Jan 2025 18:33:51 +0100
Subject: [PATCH 012/679] feat(stablediffusion-ggml): respect build type
(#4581)
* feat(stablediffusion-ggml): respect build type
Signed-off-by: Ettore Di Giacinto
* combine libraries
Signed-off-by: Ettore Di Giacinto
---------
Signed-off-by: Ettore Di Giacinto
---
Makefile | 10 +--
.../go/image/stablediffusion-ggml/Makefile | 71 ++++++++++++++++++-
backend/go/image/stablediffusion-ggml/gosd.go | 2 +-
3 files changed, 71 insertions(+), 12 deletions(-)
diff --git a/Makefile b/Makefile
index 261f2833..0ec85bc3 100644
--- a/Makefile
+++ b/Makefile
@@ -302,14 +302,8 @@ sources/stablediffusion-ggml.cpp:
git checkout $(STABLEDIFFUSION_GGML_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
-sources/stablediffusion-ggml.cpp/build/libstable-diffusion.a: sources/stablediffusion-ggml.cpp
- cd sources/stablediffusion-ggml.cpp && \
- mkdir -p build && \
- cd build && \
- cmake $(CMAKE_ARGS) .. && \
- cmake --build . --config Release
-
-backend/go/image/stablediffusion-ggml/libsd.a: sources/stablediffusion-ggml.cpp/build/libstable-diffusion.a
+backend/go/image/stablediffusion-ggml/libsd.a: sources/stablediffusion-ggml.cpp
+ $(MAKE) -C backend/go/image/stablediffusion-ggml build/libstable-diffusion.a
$(MAKE) -C backend/go/image/stablediffusion-ggml libsd.a
backend-assets/grpc/stablediffusion-ggml: backend/go/image/stablediffusion-ggml/libsd.a backend-assets/grpc
diff --git a/backend/go/image/stablediffusion-ggml/Makefile b/backend/go/image/stablediffusion-ggml/Makefile
index cca9bf6e..7c6d9a17 100644
--- a/backend/go/image/stablediffusion-ggml/Makefile
+++ b/backend/go/image/stablediffusion-ggml/Makefile
@@ -2,20 +2,85 @@ INCLUDE_PATH := $(abspath ./)
LIBRARY_PATH := $(abspath ./)
AR?=ar
-
+CMAKE_ARGS?=
BUILD_TYPE?=
# keep standard at C11 and C++11
CXXFLAGS = -I. -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/thirdparty -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/ggml/include -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp -O3 -DNDEBUG -std=c++17 -fPIC
+# Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
+CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
+
+# If build type is cublas, then we set -DGGML_CUDA=ON to CMAKE_ARGS automatically
+ifeq ($(BUILD_TYPE),cublas)
+ CMAKE_ARGS+=-DGGML_CUDA=ON
+# If build type is openblas then we set -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
+# to CMAKE_ARGS automatically
+else ifeq ($(BUILD_TYPE),openblas)
+ CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
+# If build type is clblas (openCL) we set -DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
+else ifeq ($(BUILD_TYPE),clblas)
+ CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
+# If it's hipblas we do have also to set CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++
+else ifeq ($(BUILD_TYPE),hipblas)
+ CMAKE_ARGS+=-DGGML_HIP=ON
+# If it's OSX, DO NOT embed the metal library - -DGGML_METAL_EMBED_LIBRARY=ON requires further investigation
+# But if it's OSX without metal, disable it here
+else ifeq ($(OS),Darwin)
+ ifneq ($(BUILD_TYPE),metal)
+ CMAKE_ARGS+=-DGGML_METAL=OFF
+ else
+ CMAKE_ARGS+=-DGGML_METAL=ON
+ CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
+ TARGET+=--target ggml-metal
+ endif
+endif
+
+ifeq ($(BUILD_TYPE),sycl_f16)
+ CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON
+endif
+
+ifeq ($(BUILD_TYPE),sycl_f32)
+ CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
+endif
+
# warnings
CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function
+# Find all .a archives in ARCHIVE_DIR
+# (ggml can have different backends cpu, cuda, etc., each backend generates a .a archive)
+GGML_ARCHIVE_DIR := build/ggml/src/
+ALL_ARCHIVES := $(shell find $(GGML_ARCHIVE_DIR) -type f -name '*.a')
+
+# Name of the single merged library
+COMBINED_LIB := libggmlall.a
+
+# Rule to merge all the .a files into one
+$(COMBINED_LIB): $(ALL_ARCHIVES)
+ @echo "Merging all .a into $(COMBINED_LIB)"
+ rm -f $@
+ mkdir -p merge-tmp
+ for a in $(ALL_ARCHIVES); do \
+ ( cd merge-tmp && ar x ../$$a ); \
+ done
+ ( cd merge-tmp && ar rcs ../$@ *.o )
+ # Ensure we have a proper index
+ ranlib $@
+ # Clean up
+ rm -rf merge-tmp
+
+build/libstable-diffusion.a:
+ mkdir -p build && \
+ cd build && \
+ cmake $(CMAKE_ARGS) ../../../../../sources/stablediffusion-ggml.cpp && \
+ cmake --build . --config Release
+ $(MAKE) $(COMBINED_LIB)
+
gosd.o:
$(CXX) $(CXXFLAGS) gosd.cpp -o gosd.o -c
libsd.a: gosd.o
- cp $(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/build/libstable-diffusion.a ./libsd.a
+ cp $(INCLUDE_PATH)/build/libstable-diffusion.a ./libsd.a
$(AR) rcs libsd.a gosd.o
clean:
- rm -f gosd.o libsd.a
\ No newline at end of file
+ rm -rf gosd.o libsd.a build $(COMBINED_LIB)
\ No newline at end of file
diff --git a/backend/go/image/stablediffusion-ggml/gosd.go b/backend/go/image/stablediffusion-ggml/gosd.go
index 29d0033d..8c3bdb90 100644
--- a/backend/go/image/stablediffusion-ggml/gosd.go
+++ b/backend/go/image/stablediffusion-ggml/gosd.go
@@ -1,7 +1,7 @@
package main
// #cgo CXXFLAGS: -I${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp/thirdparty -I${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp -I${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp/ggml/include
-// #cgo LDFLAGS: -L${SRCDIR}/ -L${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp/build/ggml/src/ggml-cpu -L${SRCDIR}/../../../../sources/stablediffusion-ggml.cpp/build/ggml/src -lsd -lstdc++ -lm -lggml -lggml-base -lggml-cpu -lgomp
+// #cgo LDFLAGS: -L${SRCDIR}/ -lsd -lstdc++ -lm -lggmlall -lgomp
// #include
// #include
import "C"
From 9fdb44323dd2d345886f15932586c4d178c2ba95 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 12 Jan 2025 18:50:41 +0100
Subject: [PATCH 013/679] chore(model gallery): add
LocalAI-functioncall-phi-4-v0.2 (#4589)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 9fc6f077..7eb9d479 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -23,6 +23,23 @@
- filename: phi-4-Q4_K_M.gguf
uri: huggingface://bartowski/phi-4-GGUF/phi-4-Q4_K_M.gguf
sha256: 009aba717c09d4a35890c7d35eb59d54e1dba884c7c526e7197d9c13ab5911d9
+- !!merge <<: *phi4
+ url: "github:mudler/LocalAI/gallery/phi-4-chat-fcall.yaml@master"
+ name: "LocalAI-functioncall-phi-4-v0.2"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/Dzbdzn27KEc3K6zNNi070.png
+ description: |
+ A model tailored to be conversational and execute function calls with LocalAI. This model is based on phi-4.
+ This is the second iteration of https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.1 with added CoT (o1) capabilities from the marco-o1 dataset.
+ urls:
+ - https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.2
+ - https://huggingface.co/mudler/localai-functioncall-phi-4-v0.2-Q4_K_M-GGUF
+ overrides:
+ parameters:
+ model: localai-functioncall-phi-4-v0.2-q4_k_m.gguf
+ files:
+ - filename: localai-functioncall-phi-4-v0.2-q4_k_m.gguf
+ uri: huggingface://mudler/localai-functioncall-phi-4-v0.2-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.2-q4_k_m.gguf
+ sha256: 681b5fb5070f23323a9cc8cbd1306b1c348c2f292041d3ba2335b26b071757b7
- !!merge <<: *phi4
url: "github:mudler/LocalAI/gallery/phi-4-chat-fcall.yaml@master"
name: "LocalAI-functioncall-phi-4-v0.1"
From aea71dd2c6e6cd7ddd4f9ccd3bb3ae0b714b6176 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 12 Jan 2025 22:07:01 +0100
Subject: [PATCH 014/679] fix(stablediffusion-ggml): correctly enable sycl
(#4591)
Signed-off-by: Ettore Di Giacinto
---
backend/go/image/stablediffusion-ggml/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/backend/go/image/stablediffusion-ggml/Makefile b/backend/go/image/stablediffusion-ggml/Makefile
index 7c6d9a17..9d6b6597 100644
--- a/backend/go/image/stablediffusion-ggml/Makefile
+++ b/backend/go/image/stablediffusion-ggml/Makefile
@@ -36,11 +36,11 @@ else ifeq ($(OS),Darwin)
endif
ifeq ($(BUILD_TYPE),sycl_f16)
- CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON
+ CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON -DSD_SYCL=ON -DGGML_SYCL_F16=ON
endif
ifeq ($(BUILD_TYPE),sycl_f32)
- CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
+ CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DSD_SYCL=ON
endif
# warnings
From 8d82afb5958b590310b4edb8aeb1a9f72e202b2d Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 13 Jan 2025 10:11:48 +0100
Subject: [PATCH 015/679] fix(stablediffusion-ggml): enable oneapi before build
(#4593)
Signed-off-by: Ettore Di Giacinto
---
backend/go/image/stablediffusion-ggml/Makefile | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/backend/go/image/stablediffusion-ggml/Makefile b/backend/go/image/stablediffusion-ggml/Makefile
index 9d6b6597..259d4d38 100644
--- a/backend/go/image/stablediffusion-ggml/Makefile
+++ b/backend/go/image/stablediffusion-ggml/Makefile
@@ -4,6 +4,7 @@ LIBRARY_PATH := $(abspath ./)
AR?=ar
CMAKE_ARGS?=
BUILD_TYPE?=
+ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
# keep standard at C11 and C++11
CXXFLAGS = -I. -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/thirdparty -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp/ggml/include -I$(INCLUDE_PATH)/../../../../sources/stablediffusion-ggml.cpp -O3 -DNDEBUG -std=c++17 -fPIC
@@ -69,10 +70,19 @@ $(COMBINED_LIB): $(ALL_ARCHIVES)
rm -rf merge-tmp
build/libstable-diffusion.a:
+ @echo "Building SD with $(BUILD_TYPE) build type and $(CMAKE_ARGS)"
+ifneq (,$(findstring sycl,$(BUILD_TYPE)))
+ +bash -c "source $(ONEAPI_VARS); \
+ mkdir -p build && \
+ cd build && \
+ cmake $(CMAKE_ARGS) ../../../../../sources/stablediffusion-ggml.cpp && \
+ cmake --build . --config Release"
+else
mkdir -p build && \
cd build && \
cmake $(CMAKE_ARGS) ../../../../../sources/stablediffusion-ggml.cpp && \
cmake --build . --config Release
+endif
$(MAKE) $(COMBINED_LIB)
gosd.o:
From ab5adf40af1994ffe5bbae735252c7ea88755d0f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 13 Jan 2025 17:33:06 +0100
Subject: [PATCH 016/679] =?UTF-8?q?chore(deps):=20bump=20llama.cpp=20to=20?=
=?UTF-8?q?'924518e2e5726e81f3aeb2518fb85963a500e=E2=80=A6=20(#4592)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e93a'
Signed-off-by: Ettore Di Giacinto
---
Makefile | 2 +-
backend/cpp/llama/grpc-server.cpp | 42 +++++++++++++------------------
2 files changed, 19 insertions(+), 25 deletions(-)
diff --git a/Makefile b/Makefile
index 0ec85bc3..4392980b 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=c05e8c9934f94fde49bc1bc9dc51eed282605150
+CPPLLAMA_VERSION?=924518e2e5726e81f3aeb2518fb85963a500e93a
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
diff --git a/backend/cpp/llama/grpc-server.cpp b/backend/cpp/llama/grpc-server.cpp
index 7632aebc..f0a16ffa 100644
--- a/backend/cpp/llama/grpc-server.cpp
+++ b/backend/cpp/llama/grpc-server.cpp
@@ -428,6 +428,7 @@ struct llama_server_context
{
llama_model *model = nullptr;
llama_context *ctx = nullptr;
+ const llama_vocab * vocab = nullptr;
clip_ctx *clp_ctx = nullptr;
@@ -439,6 +440,7 @@ struct llama_server_context
bool clean_kv_cache = true;
bool all_slots_are_idle = false;
bool add_bos_token = true;
+ bool has_eos_token = true;
int32_t n_ctx; // total context for all clients / slots
@@ -502,7 +504,7 @@ struct llama_server_context
if (multimodal) {
const int n_embd_clip = clip_n_mmproj_embd(clp_ctx);
- const int n_embd_llm = llama_n_embd(model);
+ const int n_embd_llm = llama_model_n_embd(model);
if (n_embd_clip != n_embd_llm) {
LOG("%s: embedding dim of the multimodal projector (%d) is not equal to that of LLaMA (%d). Make sure that you use the correct mmproj file.\n", __func__, n_embd_clip, n_embd_llm);
llama_free(ctx);
@@ -511,23 +513,15 @@ struct llama_server_context
}
}
+ vocab = llama_model_get_vocab(model);
n_ctx = llama_n_ctx(ctx);
- add_bos_token = llama_add_bos_token(model);
+ add_bos_token = llama_vocab_get_add_bos(vocab);
+ has_eos_token = llama_vocab_eos(vocab) != LLAMA_TOKEN_NULL;
return true;
}
- void validate_model_chat_template(server_params & sparams) {
- llama_chat_message chat[] = {{"user", "test"}};
- std::vector buf(1);
- int res = llama_chat_apply_template(model, nullptr, chat, 1, true, buf.data(), buf.size());
- if (res < 0) {
- LOG_ERR("The chat template comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses", __func__);
- sparams.chat_template = "<|im_start|>"; // llama_chat_apply_template only checks if <|im_start|> exist in the template
- }
- }
-
llama_client_slot* get_active_slot() {
for (llama_client_slot& slot : slots) {
// Check if the slot is currently processing
@@ -725,8 +719,8 @@ struct llama_server_context
slot->prompt = "";
}
- if (json_value(data, "ignore_eos", false)) {
- slot->sparams.logit_bias.push_back({llama_token_eos(model), -INFINITY});
+ if (json_value(data, "ignore_eos", false) && has_eos_token) {
+ slot->sparams.logit_bias.push_back({llama_vocab_eos(vocab), -INFINITY});
}
/*
slot->sparams.penalty_prompt_tokens.clear();
@@ -765,13 +759,13 @@ struct llama_server_context
}
}
*/
-
slot->sparams.logit_bias.clear();
const auto &logit_bias = data.find("logit_bias");
if (logit_bias != data.end() && logit_bias->is_array())
{
- const int n_vocab = llama_n_vocab(model);
+ const llama_vocab * vocab = llama_model_get_vocab(model);
+ const int n_vocab = llama_vocab_n_tokens(vocab);
for (const auto &el : *logit_bias)
{
if (el.is_array() && el.size() == 2)
@@ -800,7 +794,7 @@ struct llama_server_context
}
else if (el[0].is_string())
{
- auto toks = common_tokenize(model, el[0].get(), false);
+ auto toks = common_tokenize(vocab, el[0].get(), false);
for (auto tok : toks)
{
slot->sparams.logit_bias.push_back({tok, bias});
@@ -1130,7 +1124,7 @@ struct llama_server_context
slot.has_next_token = false;
}
- if (result.tok == llama_token_eos(model))
+ if (result.tok == llama_vocab_eos(vocab) || llama_vocab_is_eog(vocab, result.tok))
{
slot.stopped_eos = true;
slot.has_next_token = false;
@@ -1325,7 +1319,7 @@ struct llama_server_context
res.error = false;
res.stop = true;
- const int n_embd = llama_n_embd(model);
+ const int n_embd = llama_model_n_embd(model);
if (!params.embedding)
{
LOG_WARNING("embedding disabled", {
@@ -1424,7 +1418,7 @@ struct llama_server_context
n_eval = n_batch;
}
- const int n_embd = llama_n_embd(model);
+ const int n_embd = llama_model_n_embd(model);
float * embd = img.image_embedding + i * n_embd;
llava_embd_batch llava_batch = llava_embd_batch(embd, n_eval, slot.n_past, 0);
if (llama_decode(ctx, llava_batch.batch))
@@ -1705,11 +1699,11 @@ struct llama_server_context
suffix_tokens.erase(suffix_tokens.begin());
}
- prefix_tokens.insert(prefix_tokens.begin(), llama_token_prefix(model));
- prefix_tokens.insert(prefix_tokens.begin(), llama_token_bos(model)); // always add BOS
- prefix_tokens.insert(prefix_tokens.end(), llama_token_suffix(model));
+ prefix_tokens.insert(prefix_tokens.begin(), llama_vocab_fim_pre(vocab));
+ prefix_tokens.insert(prefix_tokens.begin(), llama_vocab_bos(vocab)); // always add BOS
+ prefix_tokens.insert(prefix_tokens.end(), llama_vocab_fim_suf(vocab));
prefix_tokens.insert(prefix_tokens.end(), suffix_tokens.begin(), suffix_tokens.end());
- prefix_tokens.push_back(llama_token_middle(model));
+ prefix_tokens.push_back(llama_vocab_fim_mid(vocab));
prompt_tokens = prefix_tokens;
}
else
From b0ead0bf12e8f08ea69065c0682d2a634795e932 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 13 Jan 2025 21:17:11 +0000
Subject: [PATCH 017/679] chore(deps): Bump securego/gosec from 2.21.4 to
2.22.0 (#4594)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.21.4 to 2.22.0.
- [Release notes](https://github.com/securego/gosec/releases)
- [Changelog](https://github.com/securego/gosec/blob/master/.goreleaser.yml)
- [Commits](https://github.com/securego/gosec/compare/v2.21.4...v2.22.0)
---
updated-dependencies:
- dependency-name: securego/gosec
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
.github/workflows/secscan.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.github/workflows/secscan.yaml b/.github/workflows/secscan.yaml
index 3fd808e1..228ac1d9 100644
--- a/.github/workflows/secscan.yaml
+++ b/.github/workflows/secscan.yaml
@@ -18,7 +18,7 @@ jobs:
if: ${{ github.actor != 'dependabot[bot]' }}
- name: Run Gosec Security Scanner
if: ${{ github.actor != 'dependabot[bot]' }}
- uses: securego/gosec@v2.21.4
+ uses: securego/gosec@v2.22.0
with:
# we let the report trigger content trigger a failure using the GitHub Security features.
args: '-no-fail -fmt sarif -out results.sarif ./...'
From 0c02512f159bfac0e04e5e7bfebfe1170e3bb505 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Tue, 14 Jan 2025 09:07:20 +0100
Subject: [PATCH 018/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`504af20ee4eae72080a56d59d744f6774f7901ce` (#4597)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 4392980b..fd05703e 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=924518e2e5726e81f3aeb2518fb85963a500e93a
+CPPLLAMA_VERSION?=504af20ee4eae72080a56d59d744f6774f7901ce
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 69c6e5b1924e9e6d7cbb13edb8dfab45ef729f12 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 14 Jan 2025 09:17:55 +0100
Subject: [PATCH 019/679] chore(stablediffusion-ggml): disable sycl
optimizations (#4598)
Signed-off-by: Ettore Di Giacinto
---
backend/go/image/stablediffusion-ggml/Makefile | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/backend/go/image/stablediffusion-ggml/Makefile b/backend/go/image/stablediffusion-ggml/Makefile
index 259d4d38..f92c3a77 100644
--- a/backend/go/image/stablediffusion-ggml/Makefile
+++ b/backend/go/image/stablediffusion-ggml/Makefile
@@ -36,13 +36,13 @@ else ifeq ($(OS),Darwin)
endif
endif
-ifeq ($(BUILD_TYPE),sycl_f16)
- CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON -DSD_SYCL=ON -DGGML_SYCL_F16=ON
-endif
+# ifeq ($(BUILD_TYPE),sycl_f16)
+# CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON -DSD_SYCL=ON -DGGML_SYCL_F16=ON
+# endif
-ifeq ($(BUILD_TYPE),sycl_f32)
- CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DSD_SYCL=ON
-endif
+# ifeq ($(BUILD_TYPE),sycl_f32)
+# CMAKE_ARGS+=-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DSD_SYCL=ON
+# endif
# warnings
CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function
From 1b3e89c89c1e82b98cdfd231d4c44ae491f3cd83 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 14 Jan 2025 09:27:18 +0100
Subject: [PATCH 020/679] chore(model gallery): add
LocalAI-functioncall-phi-4-v0.3 (#4599)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
gallery/phi-4-chat-fcall.yaml | 10 ++++++++++
2 files changed, 26 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 7eb9d479..bb0339bb 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -23,6 +23,22 @@
- filename: phi-4-Q4_K_M.gguf
uri: huggingface://bartowski/phi-4-GGUF/phi-4-Q4_K_M.gguf
sha256: 009aba717c09d4a35890c7d35eb59d54e1dba884c7c526e7197d9c13ab5911d9
+- !!merge <<: *phi4
+ url: "github:mudler/LocalAI/gallery/phi-4-chat-fcall.yaml@master"
+ name: "LocalAI-functioncall-phi-4-v0.3"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/Dzbdzn27KEc3K6zNNi070.png
+ urls:
+ - https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3
+ - https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF
+ description: |
+ A model tailored to be conversational and execute function calls with LocalAI. This model is based on phi-4.
+ overrides:
+ parameters:
+ model: localai-functioncall-phi-4-v0.3-q4_k_m.gguf
+ files:
+ - filename: localai-functioncall-phi-4-v0.3-q4_k_m.gguf
+ sha256: 23fee048ded2a6e2e1a7b6bbefa6cbf83068f194caa9552aecbaa00fec8a16d5
+ uri: huggingface://mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.3-q4_k_m.gguf
- !!merge <<: *phi4
url: "github:mudler/LocalAI/gallery/phi-4-chat-fcall.yaml@master"
name: "LocalAI-functioncall-phi-4-v0.2"
diff --git a/gallery/phi-4-chat-fcall.yaml b/gallery/phi-4-chat-fcall.yaml
index a6fa261e..23c2e53d 100644
--- a/gallery/phi-4-chat-fcall.yaml
+++ b/gallery/phi-4-chat-fcall.yaml
@@ -3,6 +3,16 @@ name: "phi-4-chat"
config_file: |
mmap: true
+ function:
+ json_regex_match:
+ - "(?s)"
+ capture_llm_results:
+ - (?s)(.*?)
+ replace_llm_results:
+ - key: (?s)(.*?)
+ value: ""
+ grammar:
+ properties_order: "name,arguments"
template:
chat_message: |
<|im_start|>{{ .RoleName }}<|im_sep|>
From 5414c294c4d2e57f1f0e09da14e341a5cd846e2b Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 14 Jan 2025 09:29:25 +0100
Subject: [PATCH 021/679] chore(model gallery): add negative-anubis-70b-v1
(#4600)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index bb0339bb..31468321 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -380,6 +380,27 @@
- filename: Negative_LLAMA_70B-Q4_K_M.gguf
sha256: 023c6bd38f6a66178529e6bb77b6e76379ae3ee031adc6885531986aa12750d9
uri: huggingface://bartowski/Negative_LLAMA_70B-GGUF/Negative_LLAMA_70B-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "negative-anubis-70b-v1"
+ icon: https://huggingface.co/knifeayumu/Negative-Anubis-70B-v1/resolve/main/Negative-Anubis.png
+ urls:
+ - https://huggingface.co/knifeayumu/Negative-Anubis-70B-v1
+ - https://huggingface.co/bartowski/Negative-Anubis-70B-v1-GGUF
+ description: |
+ Enjoyed SicariusSicariiStuff/Negative_LLAMA_70B but the prose was too dry for my tastes. So I merged it with TheDrummer/Anubis-70B-v1 for verbosity. Anubis has positivity bias so Negative could balance things out.
+
+ This is a merge of pre-trained language models created using mergekit.
+
+ The following models were included in the merge:
+ SicariusSicariiStuff/Negative_LLAMA_70B
+ TheDrummer/Anubis-70B-v1
+ overrides:
+ parameters:
+ model: Negative-Anubis-70B-v1-Q4_K_M.gguf
+ files:
+ - filename: Negative-Anubis-70B-v1-Q4_K_M.gguf
+ sha256: ac088da9ca70fffaa70c876fbada9fc5a02e7d6049ef68f16b11a9c3256f2510
+ uri: huggingface://bartowski/Negative-Anubis-70B-v1-GGUF/Negative-Anubis-70B-v1-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From 62abe0d2c9c6492213039a7ccbbecaa40808791d Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 14 Jan 2025 09:33:19 +0100
Subject: [PATCH 022/679] chore(model gallery): add qwen2.5-72b-rp-ink (#4601)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 31468321..a46d47d6 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3106,6 +3106,22 @@
- filename: Sky-T1-32B-Preview-Q4_K_M.gguf
sha256: c735912a582f10e4769461586a02e5b98ef43c2895ec11923b8c4f157e7909e5
uri: huggingface://bartowski/Sky-T1-32B-Preview-GGUF/Sky-T1-32B-Preview-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "qwen2.5-72b-rp-ink"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/M9KSL64gppBVatmTdoQnG.png
+ urls:
+ - https://huggingface.co/allura-org/Qwen2.5-72b-RP-Ink
+ - https://huggingface.co/bartowski/Qwen2.5-72b-RP-Ink-GGUF
+ description: |
+ A roleplay-focused LoRA finetune of Qwen 2.5 72b Instruct. Methodology and hyperparams inspired by SorcererLM and Slush.
+ Yet another model in the Ink series, following in the footsteps of the 32b one and the Nemo one
+ overrides:
+ parameters:
+ model: Qwen2.5-72b-RP-Ink-Q4_K_M.gguf
+ files:
+ - filename: Qwen2.5-72b-RP-Ink-Q4_K_M.gguf
+ sha256: 2c2bf785dc5798403e0ccf6c4f5f9d7d53fcfb0c0b28855c584e09be88f91517
+ uri: huggingface://bartowski/Qwen2.5-72b-RP-Ink-GGUF/Qwen2.5-72b-RP-Ink-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From b8d74e52b1e400a52a747a3a89ac3f6338c6ad4b Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 14 Jan 2025 09:41:30 +0100
Subject: [PATCH 023/679] chore(model gallery): add steiner-32b-preview (#4602)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index a46d47d6..258994e9 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3122,6 +3122,22 @@
- filename: Qwen2.5-72b-RP-Ink-Q4_K_M.gguf
sha256: 2c2bf785dc5798403e0ccf6c4f5f9d7d53fcfb0c0b28855c584e09be88f91517
uri: huggingface://bartowski/Qwen2.5-72b-RP-Ink-GGUF/Qwen2.5-72b-RP-Ink-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "steiner-32b-preview"
+ urls:
+ - https://huggingface.co/peakji/steiner-32b-preview
+ - https://huggingface.co/bartowski/steiner-32b-preview-GGUF
+ description: |
+ Steiner is a series of reasoning models trained on synthetic data using reinforcement learning. These models can explore multiple reasoning paths in an autoregressive manner during inference and autonomously verify or backtrack when necessary, enabling a linear traversal of the implicit search tree.
+
+ Steiner is a personal interest project by Yichao 'Peak' Ji, inspired by OpenAI o1. The ultimate goal is to reproduce o1 and validate the inference-time scaling curves. The Steiner-preview model is currently a work-in-progress. The reason for open-sourcing it is that Iāve found automated evaluation methods, primarily based on multiple-choice questions, struggle to fully reflect the progress of reasoning models. In fact, the assumption that "the correct answer is always among the options" doesnāt align well with real-world reasoning scenarios, as it encourages models to perform substitution-based validation rather than open-ended exploration. For this reason, Iāve chosen to open-source these intermediate results and, when time permits, to build in public. This approach allows me to share knowledge while also gathering more evaluations and feedback from real human users.
+ overrides:
+ parameters:
+ model: steiner-32b-preview-Q4_K_M.gguf
+ files:
+ - filename: steiner-32b-preview-Q4_K_M.gguf
+ sha256: 1d7bf6d6dc8db8c81b3e71dc89756cd23417bb0a645b7dcdd1f9457781a88652
+ uri: huggingface://bartowski/steiner-32b-preview-GGUF/steiner-32b-preview-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From d7dee3a5ecd7d3e60ba699ed6f12bc8d75213ffd Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 14 Jan 2025 11:13:16 +0100
Subject: [PATCH 024/679] feat(diffusers): add support for Sana pipelines
(#4603)
Signed-off-by: Ettore Di Giacinto
---
backend/python/diffusers/backend.py | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/backend/python/diffusers/backend.py b/backend/python/diffusers/backend.py
index f1b447b4..c9aa02bc 100755
--- a/backend/python/diffusers/backend.py
+++ b/backend/python/diffusers/backend.py
@@ -17,7 +17,7 @@ import backend_pb2_grpc
import grpc
-from diffusers import StableDiffusion3Pipeline, StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, \
+from diffusers import SanaPipeline, StableDiffusion3Pipeline, StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, \
EulerAncestralDiscreteScheduler, FluxPipeline, FluxTransformer2DModel
from diffusers import StableDiffusionImg2ImgPipeline, AutoPipelineForText2Image, ControlNetModel, StableVideoDiffusionPipeline
from diffusers.pipelines.stable_diffusion import safety_checker
@@ -275,6 +275,13 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
if request.LowVRAM:
self.pipe.enable_model_cpu_offload()
+ elif request.PipelineType == "SanaPipeline":
+ self.pipe = SanaPipeline.from_pretrained(
+ request.Model,
+ variant="bf16",
+ torch_dtype=torch.bfloat16)
+ self.pipe.vae.to(torch.bfloat16)
+ self.pipe.text_encoder.to(torch.bfloat16)
if CLIPSKIP and request.CLIPSkip != 0:
self.clip_skip = request.CLIPSkip
From f053f7bde224b0b64d6d6daf7a3ffa7e2036d6db Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Tue, 14 Jan 2025 23:16:33 +0100
Subject: [PATCH 025/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`b4d92a59a20eea400d8dd30844a339b76210daa0` (#4606)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index fd05703e..4c01621d 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=504af20ee4eae72080a56d59d744f6774f7901ce
+CPPLLAMA_VERSION?=b4d92a59a20eea400d8dd30844a339b76210daa0
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 792b866727454520c47bd04bc75975cd0caab876 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 15 Jan 2025 15:46:27 +0100
Subject: [PATCH 026/679] Update README.md
Signed-off-by: Ettore Di Giacinto
---
README.md | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/README.md b/README.md
index caf36bcf..ec4db188 100644
--- a/README.md
+++ b/README.md
@@ -92,19 +92,15 @@ local-ai run oci://localai/phi-2:latest
## š° Latest project news
+- January 2025: SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
- Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
- Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
- Nov 2024: Voice activity detection models (**VAD**) added to the API: https://github.com/mudler/LocalAI/pull/4204
- Oct 2024: examples moved to [LocalAI-examples](https://github.com/mudler/LocalAI-examples)
- Aug 2024: š FLUX-1, [P2P Explorer](https://explorer.localai.io)
-- July 2024: š„š„ š P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723
-- June 2024: š You can browse now the model gallery without LocalAI! Check out https://models.localai.io
-- June 2024: Support for models from OCI registries: https://github.com/mudler/LocalAI/pull/2628
+- July 2024: š„š„ š P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723. P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
- May 2024: š„š„ Decentralized P2P llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) š Docs https://localai.io/features/distribute/
-- May 2024: š„š„ Openvoice: https://github.com/mudler/LocalAI/pull/2334
-- May 2024: š Function calls without grammars and mixed mode: https://github.com/mudler/LocalAI/pull/2328
- May 2024: š„š„ Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324
-- May 2024: Chat, TTS, and Image generation in the WebUI: https://github.com/mudler/LocalAI/pull/2222
- April 2024: Reranker API: https://github.com/mudler/LocalAI/pull/2121
Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
@@ -113,12 +109,10 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A
- Multimodal with vLLM and Video understanding: https://github.com/mudler/LocalAI/pull/3729
- Realtime API https://github.com/mudler/LocalAI/issues/3714
-- š„š„ Distributed, P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
- WebUI improvements: https://github.com/mudler/LocalAI/issues/2156
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
- Assistant API: https://github.com/mudler/LocalAI/issues/1273
-- Moderation endpoint: https://github.com/mudler/LocalAI/issues/999
- Vulkan: https://github.com/mudler/LocalAI/issues/1647
- Anthropic API: https://github.com/mudler/LocalAI/issues/1808
From 5bba5edf451407e7c969940d4df4d9ce89c081b2 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 15 Jan 2025 15:46:45 +0100
Subject: [PATCH 027/679] chore(model gallery): add qwerus-7b (#4609)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 258994e9..5ef8d2ce 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3138,6 +3138,22 @@
- filename: steiner-32b-preview-Q4_K_M.gguf
sha256: 1d7bf6d6dc8db8c81b3e71dc89756cd23417bb0a645b7dcdd1f9457781a88652
uri: huggingface://bartowski/steiner-32b-preview-GGUF/steiner-32b-preview-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "qwerus-7b"
+ urls:
+ - https://huggingface.co/mlabonne/Qwerus-7B
+ - https://huggingface.co/bartowski/Qwerus-7B-GGUF
+ description: |
+ Qwerus-7B is a merge of the following models using LazyMergekit:
+ PRIME-RL/Eurus-2-7B-PRIME
+ Qwen/Qwen2.5-7B-Instruct
+ overrides:
+ parameters:
+ model: Qwerus-7B-Q4_K_M.gguf
+ files:
+ - filename: Qwerus-7B-Q4_K_M.gguf
+ sha256: 3676629e8092a59f523393e6eb5072727f5213a9e03b7b81141f05a33743e20c
+ uri: huggingface://bartowski/Qwerus-7B-GGUF/Qwerus-7B-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From 482c6b8be4382d0b91af8bc576b9ca5bd35eff8f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 15 Jan 2025 15:51:50 +0100
Subject: [PATCH 028/679] chore(model gallery): add l3.3-ms-nevoria-70b (#4610)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 5ef8d2ce..bed32d34 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -401,6 +401,23 @@
- filename: Negative-Anubis-70B-v1-Q4_K_M.gguf
sha256: ac088da9ca70fffaa70c876fbada9fc5a02e7d6049ef68f16b11a9c3256f2510
uri: huggingface://bartowski/Negative-Anubis-70B-v1-GGUF/Negative-Anubis-70B-v1-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "l3.3-ms-nevoria-70b"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/64545af5ec40bbbd01242ca6/dtlCF4LbekmDD2y3LNpdH.jpeg
+ urls:
+ - https://huggingface.co/Steelskull/L3.3-MS-Nevoria-70b
+ - https://huggingface.co/bartowski/L3.3-MS-Nevoria-70b-GGUF
+ description: |
+ This model was created as I liked the storytelling of EVA, the prose and details of scenes from EURYALE and Anubis, enhanced with Negative_LLAMA to kill off the positive bias with a touch of nemotron sprinkeled in.
+
+ The choice to use the lorablated model as a base was intentional - while it might seem counterintuitive, this approach creates unique interactions between the weights, similar to what was achieved in the original Astoria model and Astoria V2 model . Rather than simply removing refusals, this "weight twisting" effect that occurs when subtracting the lorablated base model from the other models during the merge process creates an interesting balance in the final model's behavior. While this approach differs from traditional sequential application of components, it was chosen for its unique characteristics in the model's responses.
+ overrides:
+ parameters:
+ model: L3.3-MS-Nevoria-70b-Q4_K_M.gguf
+ files:
+ - filename: L3.3-MS-Nevoria-70b-Q4_K_M.gguf
+ sha256: e8b0763f263089a19d4b112b7ed5085cc5f1ed9ca49c5085baa8d51f4ded1f94
+ uri: huggingface://bartowski/L3.3-MS-Nevoria-70b-GGUF/L3.3-MS-Nevoria-70b-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From 6d20497d45301d4ed7ecace3ecf81012cd0e5e4b Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 15 Jan 2025 15:54:12 +0100
Subject: [PATCH 029/679] chore(model gallery): add lb-reranker-0.5b-v1.0
(#4611)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index bed32d34..40dc85a4 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3171,6 +3171,36 @@
- filename: Qwerus-7B-Q4_K_M.gguf
sha256: 3676629e8092a59f523393e6eb5072727f5213a9e03b7b81141f05a33743e20c
uri: huggingface://bartowski/Qwerus-7B-GGUF/Qwerus-7B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "lb-reranker-0.5b-v1.0"
+ urls:
+ - https://huggingface.co/lightblue/lb-reranker-0.5B-v1.0
+ - https://huggingface.co/bartowski/lb-reranker-0.5B-v1.0-GGUF
+ description: |
+ The LB Reranker has been trained to determine the relatedness of a given query to a piece of text, therefore allowing it to be used as a ranker or reranker in various retrieval-based tasks.
+
+ This model is fine-tuned from a Qwen/Qwen2.5-0.5B-Instruct model checkpoint and was trained for roughly 5.5 hours using the 8 x L20 instance (ecs.gn8is-8x.32xlarge) on Alibaba Cloud.
+
+ The training data for this model can be found at lightblue/reranker_continuous_filt_max7_train and the code for generating this data as well as running the training of the model can be found on our Github repo.
+
+ Trained on data in over 95 languages, this model is applicable to a broad range of use cases.
+
+ This model has three main benefits over comparable rerankers.
+
+ It has shown slightly higher performance on evaluation benchmarks.
+ It has been trained on more languages than any previous model.
+ It is a simple Causal LM model trained to output a string between "1" and "7".
+
+ This last point means that this model can be used natively with many widely available inference packages, including vLLM and LMDeploy. This in turns allows our reranker to benefit from improvements to inference as and when these packages release them.
+
+ Update: We have also found that this model works pretty well as a code snippet reranker too (P@1 of 96%)! See our Colab for more details.
+ overrides:
+ parameters:
+ model: lb-reranker-0.5B-v1.0-Q4_K_M.gguf
+ files:
+ - filename: lb-reranker-0.5B-v1.0-Q4_K_M.gguf
+ sha256: 43568150de5136da15c996bbf4d1a78cc6580515c40f0ef9a8c90b0542228ab3
+ uri: huggingface://bartowski/lb-reranker-0.5B-v1.0-GGUF/lb-reranker-0.5B-v1.0-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From 26c3deb6739f9f933c9825228f4878c7cdfa1f64 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Thu, 16 Jan 2025 01:08:52 +0100
Subject: [PATCH 030/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`adc5dd92e8aea98f5e7ac84f6e1bc15de35130b5` (#4612)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 4c01621d..143b109b 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=b4d92a59a20eea400d8dd30844a339b76210daa0
+CPPLLAMA_VERSION?=adc5dd92e8aea98f5e7ac84f6e1bc15de35130b5
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 8131ddd87835362834432c3cd1b9500b072d83ed Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 16 Jan 2025 09:58:14 +0100
Subject: [PATCH 031/679] chore(model gallery): add uwu-7b-instruct (#4613)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 40dc85a4..7c4e86b4 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3201,6 +3201,20 @@
- filename: lb-reranker-0.5B-v1.0-Q4_K_M.gguf
sha256: 43568150de5136da15c996bbf4d1a78cc6580515c40f0ef9a8c90b0542228ab3
uri: huggingface://bartowski/lb-reranker-0.5B-v1.0-GGUF/lb-reranker-0.5B-v1.0-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "uwu-7b-instruct"
+ urls:
+ - https://huggingface.co/qingy2024/UwU-7B-Instruct
+ - https://huggingface.co/bartowski/UwU-7B-Instruct-GGUF
+ description: |
+ Small QwQ, full-finetuned on FineQwQ-142K. Unlike my previous models, this one is a general-purpose reasoning machine!
+ overrides:
+ parameters:
+ model: UwU-7B-Instruct-Q4_K_M.gguf
+ files:
+ - filename: UwU-7B-Instruct-Q4_K_M.gguf
+ sha256: 279b2ba20d51bb155c8dd497cf49e0c28407b1822c75de88cfd83d13fd14a59f
+ uri: huggingface://bartowski/UwU-7B-Instruct-GGUF/UwU-7B-Instruct-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From 560ba6f25e19cabdb5defbeda2d57d14ed3700df Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 16 Jan 2025 10:04:44 +0100
Subject: [PATCH 032/679] chore(model gallery): add drt-o1-14b (#4614)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 7c4e86b4..647bc942 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3215,6 +3215,28 @@
- filename: UwU-7B-Instruct-Q4_K_M.gguf
sha256: 279b2ba20d51bb155c8dd497cf49e0c28407b1822c75de88cfd83d13fd14a59f
uri: huggingface://bartowski/UwU-7B-Instruct-GGUF/UwU-7B-Instruct-Q4_K_M.gguf
+
+- !!merge <<: *qwen25
+ name: "drt-o1-14b"
+ urls:
+ - https://huggingface.co/Krystalan/DRT-o1-14B
+ - https://huggingface.co/bartowski/DRT-o1-14B-GGUF
+ description: |
+ This repository contains the resources for our paper "DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought"
+ In this work, we introduce DRT-o1, an attempt to bring the success of long thought reasoning to neural machine translation (MT). To this end,
+
+ š We mine English sentences with similes or metaphors from existing literature books, which are suitable for translation via long thought.
+ š We propose a designed multi-agent framework with three agents (i.e., a translator, an advisor and an evaluator) to synthesize the MT samples with long thought. There are 22,264 synthesized samples in total.
+ š We train DRT-o1-8B, DRT-o1-7B and DRT-o1-14B using Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct as backbones.
+
+ Our goal is not to achieve competitive performance with OpenAIās O1 in neural machine translation (MT). Instead, we explore technical routes to bring the success of long thought to MT. To this end, we introduce DRT-o1, a byproduct of our exploration, and we hope it could facilitate the corresponding research in this direction.
+ overrides:
+ parameters:
+ model: DRT-o1-14B-Q4_K_M.gguf
+ files:
+ - filename: DRT-o1-14B-Q4_K_M.gguf
+ sha256: 9619ca984cf4ce8e4f69bcde831de17b2ce05dd89536e3130608877521e3d328
+ uri: huggingface://bartowski/DRT-o1-14B-GGUF/DRT-o1-14B-Q4_K_M.gguf
- &smollm
## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
From de4aa9fb1d48abc45577a96f7a4a4541c96226d4 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 16 Jan 2025 10:09:25 +0100
Subject: [PATCH 033/679] chore(model gallery): add
vikhr-qwen-2.5-1.5b-instruct (#4615)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 647bc942..22d748d8 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3215,7 +3215,6 @@
- filename: UwU-7B-Instruct-Q4_K_M.gguf
sha256: 279b2ba20d51bb155c8dd497cf49e0c28407b1822c75de88cfd83d13fd14a59f
uri: huggingface://bartowski/UwU-7B-Instruct-GGUF/UwU-7B-Instruct-Q4_K_M.gguf
-
- !!merge <<: *qwen25
name: "drt-o1-14b"
urls:
@@ -3282,6 +3281,20 @@
- filename: smollm2-1.7b-instruct-q4_k_m.gguf
sha256: decd2598bc2c8ed08c19adc3c8fdd461ee19ed5708679d1c54ef54a5a30d4f33
uri: huggingface://HuggingFaceTB/SmolLM2-1.7B-Instruct-GGUF/smollm2-1.7b-instruct-q4_k_m.gguf
+- !!merge <<: qwen25
+ name: "vikhr-qwen-2.5-1.5b-instruct"
+ urls:
+ - https://huggingface.co/Vikhrmodels/Vikhr-Qwen-2.5-1.5B-Instruct
+ - https://huggingface.co/QuantFactory/Vikhr-Qwen-2.5-1.5B-Instruct-GGUF
+ description: |
+ Instructive model based on Qwen-2.5-1.5B-Instruct, trained on the Russian-language dataset GrandMaster-PRO-MAX. Designed for high-efficiency text processing in Russian and English, delivering precise responses and fast task execution.
+ overrides:
+ parameters:
+ model: Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
+ files:
+ - filename: Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
+ sha256: eaeac314e30b461413bc1cc819cdc0cd6a79265711fd0b8268702960a082c7bd
+ uri: huggingface://QuantFactory/Vikhr-Qwen-2.5-1.5B-Instruct-GGUF/Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
- &llama31
## LLama3.1
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
From acb2eb23c8376f853fc109f59e93b318f5fb08c1 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 16 Jan 2025 22:23:09 +0100
Subject: [PATCH 034/679] feat(tts): Add Kokoro backend (#4616)
* feat(kokoro): Add new TTS backend
Signed-off-by: Ettore Di Giacinto
* Add kokoro to images
Signed-off-by: Ettore Di Giacinto
* Support combined voices
Signed-off-by: Ettore Di Giacinto
* Ignore pt and onnx
Signed-off-by: Ettore Di Giacinto
* Add plbert and istfnet
Signed-off-by: Ettore Di Giacinto
---------
Signed-off-by: Ettore Di Giacinto
---
Dockerfile | 9 +-
Makefile | 13 +-
backend/python/kokoro/Makefile | 20 +
backend/python/kokoro/backend.py | 131 +++++
backend/python/kokoro/install.sh | 14 +
backend/python/kokoro/istftnet.py | 524 ++++++++++++++++++
backend/python/kokoro/kokoro.py | 166 ++++++
backend/python/kokoro/models.py | 373 +++++++++++++
backend/python/kokoro/plbert.py | 16 +
backend/python/kokoro/protogen.sh | 6 +
backend/python/kokoro/requirements-cpu.txt | 2 +
.../python/kokoro/requirements-cublas11.txt | 3 +
.../python/kokoro/requirements-cublas12.txt | 2 +
.../python/kokoro/requirements-hipblas.txt | 3 +
backend/python/kokoro/requirements-intel.txt | 5 +
backend/python/kokoro/requirements.txt | 7 +
backend/python/kokoro/run.sh | 4 +
backend/python/kokoro/test.sh | 6 +
pkg/model/loader.go | 2 +
19 files changed, 1303 insertions(+), 3 deletions(-)
create mode 100644 backend/python/kokoro/Makefile
create mode 100755 backend/python/kokoro/backend.py
create mode 100755 backend/python/kokoro/install.sh
create mode 100644 backend/python/kokoro/istftnet.py
create mode 100644 backend/python/kokoro/kokoro.py
create mode 100644 backend/python/kokoro/models.py
create mode 100644 backend/python/kokoro/plbert.py
create mode 100644 backend/python/kokoro/protogen.sh
create mode 100644 backend/python/kokoro/requirements-cpu.txt
create mode 100644 backend/python/kokoro/requirements-cublas11.txt
create mode 100644 backend/python/kokoro/requirements-cublas12.txt
create mode 100644 backend/python/kokoro/requirements-hipblas.txt
create mode 100644 backend/python/kokoro/requirements-intel.txt
create mode 100644 backend/python/kokoro/requirements.txt
create mode 100755 backend/python/kokoro/run.sh
create mode 100755 backend/python/kokoro/test.sh
diff --git a/Dockerfile b/Dockerfile
index 42c1c1fc..481edf90 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,vall-e-x:/build/backend/python/vall-e-x/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vall-e-x:/build/backend/python/vall-e-x/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
RUN apt-get update && \
@@ -436,6 +436,10 @@ SHELL ["/bin/bash", "-c"]
# Splitting the backends into more groups with fewer items results in a larger image, but a smaller size for the largest layer
# Splitting the backends into fewer groups with more items results in a smaller image, but a larger size for the largest layer
+RUN if [[ ( "${IMAGE_TYPE}" == "extras ")]]; then \
+ apt-get -qq -y install espeak-ng \
+ ; fi
+
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/coqui \
; fi && \
@@ -452,6 +456,9 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAG
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vall-e-x" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/vall-e-x \
; fi && \
+ if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
+ make -C backend/python/kokoro \
+ ; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "openvoice" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/openvoice \
; fi && \
diff --git a/Makefile b/Makefile
index 143b109b..49c81950 100644
--- a/Makefile
+++ b/Makefile
@@ -583,10 +583,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
-protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen transformers-musicgen-protogen vall-e-x-protogen vllm-protogen openvoice-protogen
+protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen transformers-musicgen-protogen vall-e-x-protogen kokoro-protogen vllm-protogen openvoice-protogen
.PHONY: protogen-python-clean
-protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean transformers-musicgen-protogen-clean parler-tts-protogen-clean vall-e-x-protogen-clean vllm-protogen-clean openvoice-protogen-clean
+protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean transformers-musicgen-protogen-clean parler-tts-protogen-clean vall-e-x-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -684,6 +684,14 @@ vall-e-x-protogen:
vall-e-x-protogen-clean:
$(MAKE) -C backend/python/vall-e-x protogen-clean
+.PHONY: kokoro-protogen
+kokoro-protogen:
+ $(MAKE) -C backend/python/kokoro protogen
+
+.PHONY: kokoro-protogen-clean
+kokoro-protogen-clean:
+ $(MAKE) -C backend/python/kokoro protogen-clean
+
.PHONY: openvoice-protogen
openvoice-protogen:
$(MAKE) -C backend/python/openvoice protogen
@@ -715,6 +723,7 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/transformers-musicgen
$(MAKE) -C backend/python/parler-tts
$(MAKE) -C backend/python/vall-e-x
+ $(MAKE) -C backend/python/kokoro
$(MAKE) -C backend/python/openvoice
$(MAKE) -C backend/python/exllama2
diff --git a/backend/python/kokoro/Makefile b/backend/python/kokoro/Makefile
new file mode 100644
index 00000000..c0e5169f
--- /dev/null
+++ b/backend/python/kokoro/Makefile
@@ -0,0 +1,20 @@
+.DEFAULT_GOAL := install
+
+.PHONY: install
+install:
+ bash install.sh
+ $(MAKE) protogen
+
+.PHONY: protogen
+protogen: backend_pb2_grpc.py backend_pb2.py
+
+.PHONY: protogen-clean
+protogen-clean:
+ $(RM) backend_pb2_grpc.py backend_pb2.py
+
+backend_pb2_grpc.py backend_pb2.py:
+ bash protogen.sh
+
+.PHONY: clean
+clean: protogen-clean
+ rm -rf venv __pycache__
\ No newline at end of file
diff --git a/backend/python/kokoro/backend.py b/backend/python/kokoro/backend.py
new file mode 100755
index 00000000..1fd1feb9
--- /dev/null
+++ b/backend/python/kokoro/backend.py
@@ -0,0 +1,131 @@
+#!/usr/bin/env python3
+"""
+Extra gRPC server for Kokoro models.
+"""
+from concurrent import futures
+
+import argparse
+import signal
+import sys
+import os
+import time
+import backend_pb2
+import backend_pb2_grpc
+import soundfile as sf
+import grpc
+
+from models import build_model
+from kokoro import generate
+import torch
+
+SAMPLE_RATE = 22050
+_ONE_DAY_IN_SECONDS = 60 * 60 * 24
+
+# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
+MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
+
+# Implement the BackendServicer class with the service methods
+class BackendServicer(backend_pb2_grpc.BackendServicer):
+ """
+ A gRPC servicer for the backend service.
+
+ This class implements the gRPC methods for the backend service, including Health, LoadModel, and Embedding.
+ """
+ def Health(self, request, context):
+ """
+ A gRPC method that returns the health status of the backend service.
+
+ Args:
+ request: A HealthRequest object that contains the request parameters.
+ context: A grpc.ServicerContext object that provides information about the RPC.
+
+ Returns:
+ A Reply object that contains the health status of the backend service.
+ """
+ return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
+
+ def LoadModel(self, request, context):
+ """
+ A gRPC method that loads a model into memory.
+
+ Args:
+ request: A LoadModelRequest object that contains the request parameters.
+ context: A grpc.ServicerContext object that provides information about the RPC.
+
+ Returns:
+ A Result object that contains the result of the LoadModel operation.
+ """
+ model_name = request.Model
+ try:
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
+ self.MODEL = build_model(request.ModelFile, device)
+ options = request.Options
+ # Find the voice from the options, options are a list of strings in this form optname:optvalue:
+ VOICE_NAME = None
+ for opt in options:
+ if opt.startswith("voice:"):
+ VOICE_NAME = opt.split(":")[1]
+ break
+ if VOICE_NAME is None:
+ return backend_pb2.Result(success=False, message=f"No voice specified in options")
+ MODELPATH = request.ModelPath
+ # If voice name contains a plus, split it and load the two models and combine them
+ if "+" in VOICE_NAME:
+ voice1, voice2 = VOICE_NAME.split("+")
+ voice1 = torch.load(f'{MODELPATH}/{voice1}.pt', weights_only=True).to(device)
+ voice2 = torch.load(f'{MODELPATH}/{voice2}.pt', weights_only=True).to(device)
+ self.VOICEPACK = torch.mean(torch.stack([voice1, voice2]), dim=0)
+ else:
+ self.VOICEPACK = torch.load(f'{MODELPATH}/{VOICE_NAME}.pt', weights_only=True).to(device)
+
+ self.VOICE_NAME = VOICE_NAME
+
+ print(f'Loaded voice: {VOICE_NAME}')
+ except Exception as err:
+ return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
+
+ return backend_pb2.Result(message="Model loaded successfully", success=True)
+
+ def TTS(self, request, context):
+ model_name = request.model
+ if model_name == "":
+ return backend_pb2.Result(success=False, message="request.model is required")
+ try:
+ audio, out_ps = generate(self.MODEL, request.text, self.VOICEPACK, lang=self.VOICE_NAME)
+ print(out_ps)
+ sf.write(request.dst, audio, SAMPLE_RATE)
+ except Exception as err:
+ return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
+ return backend_pb2.Result(success=True)
+
+def serve(address):
+ server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
+ backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
+ server.add_insecure_port(address)
+ server.start()
+ print("[Kokoro] Server started. Listening on: " + address, file=sys.stderr)
+
+ # Define the signal handler function
+ def signal_handler(sig, frame):
+ print("[Kokoro] Received termination signal. Shutting down...")
+ server.stop(0)
+ sys.exit(0)
+
+ # Set the signal handlers for SIGINT and SIGTERM
+ signal.signal(signal.SIGINT, signal_handler)
+ signal.signal(signal.SIGTERM, signal_handler)
+
+ try:
+ while True:
+ time.sleep(_ONE_DAY_IN_SECONDS)
+ except KeyboardInterrupt:
+ server.stop(0)
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description="Run the gRPC server.")
+ parser.add_argument(
+ "--addr", default="localhost:50051", help="The address to bind the server to."
+ )
+ args = parser.parse_args()
+ print(f"[Kokoro] startup: {args}", file=sys.stderr)
+ serve(args.addr)
diff --git a/backend/python/kokoro/install.sh b/backend/python/kokoro/install.sh
new file mode 100755
index 00000000..36443ef1
--- /dev/null
+++ b/backend/python/kokoro/install.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+set -e
+
+source $(dirname $0)/../common/libbackend.sh
+
+# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
+# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
+# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
+# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
+if [ "x${BUILD_PROFILE}" == "xintel" ]; then
+ EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
+fi
+
+installRequirements
diff --git a/backend/python/kokoro/istftnet.py b/backend/python/kokoro/istftnet.py
new file mode 100644
index 00000000..818fb912
--- /dev/null
+++ b/backend/python/kokoro/istftnet.py
@@ -0,0 +1,524 @@
+# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/istftnet.py
+# https://github.com/yl4579/StyleTTS2/blob/main/Modules/istftnet.py
+from scipy.signal import get_window
+from torch.nn import Conv1d, ConvTranspose1d
+from torch.nn.utils import weight_norm, remove_weight_norm
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# https://github.com/yl4579/StyleTTS2/blob/main/Modules/utils.py
+def init_weights(m, mean=0.0, std=0.01):
+ classname = m.__class__.__name__
+ if classname.find("Conv") != -1:
+ m.weight.data.normal_(mean, std)
+
+def get_padding(kernel_size, dilation=1):
+ return int((kernel_size*dilation - dilation)/2)
+
+LRELU_SLOPE = 0.1
+
+class AdaIN1d(nn.Module):
+ def __init__(self, style_dim, num_features):
+ super().__init__()
+ self.norm = nn.InstanceNorm1d(num_features, affine=False)
+ self.fc = nn.Linear(style_dim, num_features*2)
+
+ def forward(self, x, s):
+ h = self.fc(s)
+ h = h.view(h.size(0), h.size(1), 1)
+ gamma, beta = torch.chunk(h, chunks=2, dim=1)
+ return (1 + gamma) * self.norm(x) + beta
+
+class AdaINResBlock1(torch.nn.Module):
+ def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5), style_dim=64):
+ super(AdaINResBlock1, self).__init__()
+ self.convs1 = nn.ModuleList([
+ weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[0],
+ padding=get_padding(kernel_size, dilation[0]))),
+ weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[1],
+ padding=get_padding(kernel_size, dilation[1]))),
+ weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=dilation[2],
+ padding=get_padding(kernel_size, dilation[2])))
+ ])
+ self.convs1.apply(init_weights)
+
+ self.convs2 = nn.ModuleList([
+ weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1,
+ padding=get_padding(kernel_size, 1))),
+ weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1,
+ padding=get_padding(kernel_size, 1))),
+ weight_norm(Conv1d(channels, channels, kernel_size, 1, dilation=1,
+ padding=get_padding(kernel_size, 1)))
+ ])
+ self.convs2.apply(init_weights)
+
+ self.adain1 = nn.ModuleList([
+ AdaIN1d(style_dim, channels),
+ AdaIN1d(style_dim, channels),
+ AdaIN1d(style_dim, channels),
+ ])
+
+ self.adain2 = nn.ModuleList([
+ AdaIN1d(style_dim, channels),
+ AdaIN1d(style_dim, channels),
+ AdaIN1d(style_dim, channels),
+ ])
+
+ self.alpha1 = nn.ParameterList([nn.Parameter(torch.ones(1, channels, 1)) for i in range(len(self.convs1))])
+ self.alpha2 = nn.ParameterList([nn.Parameter(torch.ones(1, channels, 1)) for i in range(len(self.convs2))])
+
+
+ def forward(self, x, s):
+ for c1, c2, n1, n2, a1, a2 in zip(self.convs1, self.convs2, self.adain1, self.adain2, self.alpha1, self.alpha2):
+ xt = n1(x, s)
+ xt = xt + (1 / a1) * (torch.sin(a1 * xt) ** 2) # Snake1D
+ xt = c1(xt)
+ xt = n2(xt, s)
+ xt = xt + (1 / a2) * (torch.sin(a2 * xt) ** 2) # Snake1D
+ xt = c2(xt)
+ x = xt + x
+ return x
+
+ def remove_weight_norm(self):
+ for l in self.convs1:
+ remove_weight_norm(l)
+ for l in self.convs2:
+ remove_weight_norm(l)
+
+class TorchSTFT(torch.nn.Module):
+ def __init__(self, filter_length=800, hop_length=200, win_length=800, window='hann'):
+ super().__init__()
+ self.filter_length = filter_length
+ self.hop_length = hop_length
+ self.win_length = win_length
+ self.window = torch.from_numpy(get_window(window, win_length, fftbins=True).astype(np.float32))
+
+ def transform(self, input_data):
+ forward_transform = torch.stft(
+ input_data,
+ self.filter_length, self.hop_length, self.win_length, window=self.window.to(input_data.device),
+ return_complex=True)
+
+ return torch.abs(forward_transform), torch.angle(forward_transform)
+
+ def inverse(self, magnitude, phase):
+ inverse_transform = torch.istft(
+ magnitude * torch.exp(phase * 1j),
+ self.filter_length, self.hop_length, self.win_length, window=self.window.to(magnitude.device))
+
+ return inverse_transform.unsqueeze(-2) # unsqueeze to stay consistent with conv_transpose1d implementation
+
+ def forward(self, input_data):
+ self.magnitude, self.phase = self.transform(input_data)
+ reconstruction = self.inverse(self.magnitude, self.phase)
+ return reconstruction
+
+class SineGen(torch.nn.Module):
+ """ Definition of sine generator
+ SineGen(samp_rate, harmonic_num = 0,
+ sine_amp = 0.1, noise_std = 0.003,
+ voiced_threshold = 0,
+ flag_for_pulse=False)
+ samp_rate: sampling rate in Hz
+ harmonic_num: number of harmonic overtones (default 0)
+ sine_amp: amplitude of sine-wavefrom (default 0.1)
+ noise_std: std of Gaussian noise (default 0.003)
+ voiced_thoreshold: F0 threshold for U/V classification (default 0)
+ flag_for_pulse: this SinGen is used inside PulseGen (default False)
+ Note: when flag_for_pulse is True, the first time step of a voiced
+ segment is always sin(np.pi) or cos(0)
+ """
+
+ def __init__(self, samp_rate, upsample_scale, harmonic_num=0,
+ sine_amp=0.1, noise_std=0.003,
+ voiced_threshold=0,
+ flag_for_pulse=False):
+ super(SineGen, self).__init__()
+ self.sine_amp = sine_amp
+ self.noise_std = noise_std
+ self.harmonic_num = harmonic_num
+ self.dim = self.harmonic_num + 1
+ self.sampling_rate = samp_rate
+ self.voiced_threshold = voiced_threshold
+ self.flag_for_pulse = flag_for_pulse
+ self.upsample_scale = upsample_scale
+
+ def _f02uv(self, f0):
+ # generate uv signal
+ uv = (f0 > self.voiced_threshold).type(torch.float32)
+ return uv
+
+ def _f02sine(self, f0_values):
+ """ f0_values: (batchsize, length, dim)
+ where dim indicates fundamental tone and overtones
+ """
+ # convert to F0 in rad. The interger part n can be ignored
+ # because 2 * np.pi * n doesn't affect phase
+ rad_values = (f0_values / self.sampling_rate) % 1
+
+ # initial phase noise (no noise for fundamental component)
+ rand_ini = torch.rand(f0_values.shape[0], f0_values.shape[2], \
+ device=f0_values.device)
+ rand_ini[:, 0] = 0
+ rad_values[:, 0, :] = rad_values[:, 0, :] + rand_ini
+
+ # instantanouse phase sine[t] = sin(2*pi \sum_i=1 ^{t} rad)
+ if not self.flag_for_pulse:
+# # for normal case
+
+# # To prevent torch.cumsum numerical overflow,
+# # it is necessary to add -1 whenever \sum_k=1^n rad_value_k > 1.
+# # Buffer tmp_over_one_idx indicates the time step to add -1.
+# # This will not change F0 of sine because (x-1) * 2*pi = x * 2*pi
+# tmp_over_one = torch.cumsum(rad_values, 1) % 1
+# tmp_over_one_idx = (padDiff(tmp_over_one)) < 0
+# cumsum_shift = torch.zeros_like(rad_values)
+# cumsum_shift[:, 1:, :] = tmp_over_one_idx * -1.0
+
+# phase = torch.cumsum(rad_values, dim=1) * 2 * np.pi
+ rad_values = torch.nn.functional.interpolate(rad_values.transpose(1, 2),
+ scale_factor=1/self.upsample_scale,
+ mode="linear").transpose(1, 2)
+
+# tmp_over_one = torch.cumsum(rad_values, 1) % 1
+# tmp_over_one_idx = (padDiff(tmp_over_one)) < 0
+# cumsum_shift = torch.zeros_like(rad_values)
+# cumsum_shift[:, 1:, :] = tmp_over_one_idx * -1.0
+
+ phase = torch.cumsum(rad_values, dim=1) * 2 * np.pi
+ phase = torch.nn.functional.interpolate(phase.transpose(1, 2) * self.upsample_scale,
+ scale_factor=self.upsample_scale, mode="linear").transpose(1, 2)
+ sines = torch.sin(phase)
+
+ else:
+ # If necessary, make sure that the first time step of every
+ # voiced segments is sin(pi) or cos(0)
+ # This is used for pulse-train generation
+
+ # identify the last time step in unvoiced segments
+ uv = self._f02uv(f0_values)
+ uv_1 = torch.roll(uv, shifts=-1, dims=1)
+ uv_1[:, -1, :] = 1
+ u_loc = (uv < 1) * (uv_1 > 0)
+
+ # get the instantanouse phase
+ tmp_cumsum = torch.cumsum(rad_values, dim=1)
+ # different batch needs to be processed differently
+ for idx in range(f0_values.shape[0]):
+ temp_sum = tmp_cumsum[idx, u_loc[idx, :, 0], :]
+ temp_sum[1:, :] = temp_sum[1:, :] - temp_sum[0:-1, :]
+ # stores the accumulation of i.phase within
+ # each voiced segments
+ tmp_cumsum[idx, :, :] = 0
+ tmp_cumsum[idx, u_loc[idx, :, 0], :] = temp_sum
+
+ # rad_values - tmp_cumsum: remove the accumulation of i.phase
+ # within the previous voiced segment.
+ i_phase = torch.cumsum(rad_values - tmp_cumsum, dim=1)
+
+ # get the sines
+ sines = torch.cos(i_phase * 2 * np.pi)
+ return sines
+
+ def forward(self, f0):
+ """ sine_tensor, uv = forward(f0)
+ input F0: tensor(batchsize=1, length, dim=1)
+ f0 for unvoiced steps should be 0
+ output sine_tensor: tensor(batchsize=1, length, dim)
+ output uv: tensor(batchsize=1, length, 1)
+ """
+ f0_buf = torch.zeros(f0.shape[0], f0.shape[1], self.dim,
+ device=f0.device)
+ # fundamental component
+ fn = torch.multiply(f0, torch.FloatTensor([[range(1, self.harmonic_num + 2)]]).to(f0.device))
+
+ # generate sine waveforms
+ sine_waves = self._f02sine(fn) * self.sine_amp
+
+ # generate uv signal
+ # uv = torch.ones(f0.shape)
+ # uv = uv * (f0 > self.voiced_threshold)
+ uv = self._f02uv(f0)
+
+ # noise: for unvoiced should be similar to sine_amp
+ # std = self.sine_amp/3 -> max value ~ self.sine_amp
+ # . for voiced regions is self.noise_std
+ noise_amp = uv * self.noise_std + (1 - uv) * self.sine_amp / 3
+ noise = noise_amp * torch.randn_like(sine_waves)
+
+ # first: set the unvoiced part to 0 by uv
+ # then: additive noise
+ sine_waves = sine_waves * uv + noise
+ return sine_waves, uv, noise
+
+
+class SourceModuleHnNSF(torch.nn.Module):
+ """ SourceModule for hn-nsf
+ SourceModule(sampling_rate, harmonic_num=0, sine_amp=0.1,
+ add_noise_std=0.003, voiced_threshod=0)
+ sampling_rate: sampling_rate in Hz
+ harmonic_num: number of harmonic above F0 (default: 0)
+ sine_amp: amplitude of sine source signal (default: 0.1)
+ add_noise_std: std of additive Gaussian noise (default: 0.003)
+ note that amplitude of noise in unvoiced is decided
+ by sine_amp
+ voiced_threshold: threhold to set U/V given F0 (default: 0)
+ Sine_source, noise_source = SourceModuleHnNSF(F0_sampled)
+ F0_sampled (batchsize, length, 1)
+ Sine_source (batchsize, length, 1)
+ noise_source (batchsize, length 1)
+ uv (batchsize, length, 1)
+ """
+
+ def __init__(self, sampling_rate, upsample_scale, harmonic_num=0, sine_amp=0.1,
+ add_noise_std=0.003, voiced_threshod=0):
+ super(SourceModuleHnNSF, self).__init__()
+
+ self.sine_amp = sine_amp
+ self.noise_std = add_noise_std
+
+ # to produce sine waveforms
+ self.l_sin_gen = SineGen(sampling_rate, upsample_scale, harmonic_num,
+ sine_amp, add_noise_std, voiced_threshod)
+
+ # to merge source harmonics into a single excitation
+ self.l_linear = torch.nn.Linear(harmonic_num + 1, 1)
+ self.l_tanh = torch.nn.Tanh()
+
+ def forward(self, x):
+ """
+ Sine_source, noise_source = SourceModuleHnNSF(F0_sampled)
+ F0_sampled (batchsize, length, 1)
+ Sine_source (batchsize, length, 1)
+ noise_source (batchsize, length 1)
+ """
+ # source for harmonic branch
+ with torch.no_grad():
+ sine_wavs, uv, _ = self.l_sin_gen(x)
+ sine_merge = self.l_tanh(self.l_linear(sine_wavs))
+
+ # source for noise branch, in the same shape as uv
+ noise = torch.randn_like(uv) * self.sine_amp / 3
+ return sine_merge, noise, uv
+def padDiff(x):
+ return F.pad(F.pad(x, (0,0,-1,1), 'constant', 0) - x, (0,0,0,-1), 'constant', 0)
+
+
+class Generator(torch.nn.Module):
+ def __init__(self, style_dim, resblock_kernel_sizes, upsample_rates, upsample_initial_channel, resblock_dilation_sizes, upsample_kernel_sizes, gen_istft_n_fft, gen_istft_hop_size):
+ super(Generator, self).__init__()
+
+ self.num_kernels = len(resblock_kernel_sizes)
+ self.num_upsamples = len(upsample_rates)
+ resblock = AdaINResBlock1
+
+ self.m_source = SourceModuleHnNSF(
+ sampling_rate=24000,
+ upsample_scale=np.prod(upsample_rates) * gen_istft_hop_size,
+ harmonic_num=8, voiced_threshod=10)
+ self.f0_upsamp = torch.nn.Upsample(scale_factor=np.prod(upsample_rates) * gen_istft_hop_size)
+ self.noise_convs = nn.ModuleList()
+ self.noise_res = nn.ModuleList()
+
+ self.ups = nn.ModuleList()
+ for i, (u, k) in enumerate(zip(upsample_rates, upsample_kernel_sizes)):
+ self.ups.append(weight_norm(
+ ConvTranspose1d(upsample_initial_channel//(2**i), upsample_initial_channel//(2**(i+1)),
+ k, u, padding=(k-u)//2)))
+
+ self.resblocks = nn.ModuleList()
+ for i in range(len(self.ups)):
+ ch = upsample_initial_channel//(2**(i+1))
+ for j, (k, d) in enumerate(zip(resblock_kernel_sizes,resblock_dilation_sizes)):
+ self.resblocks.append(resblock(ch, k, d, style_dim))
+
+ c_cur = upsample_initial_channel // (2 ** (i + 1))
+
+ if i + 1 < len(upsample_rates): #
+ stride_f0 = np.prod(upsample_rates[i + 1:])
+ self.noise_convs.append(Conv1d(
+ gen_istft_n_fft + 2, c_cur, kernel_size=stride_f0 * 2, stride=stride_f0, padding=(stride_f0+1) // 2))
+ self.noise_res.append(resblock(c_cur, 7, [1,3,5], style_dim))
+ else:
+ self.noise_convs.append(Conv1d(gen_istft_n_fft + 2, c_cur, kernel_size=1))
+ self.noise_res.append(resblock(c_cur, 11, [1,3,5], style_dim))
+
+
+ self.post_n_fft = gen_istft_n_fft
+ self.conv_post = weight_norm(Conv1d(ch, self.post_n_fft + 2, 7, 1, padding=3))
+ self.ups.apply(init_weights)
+ self.conv_post.apply(init_weights)
+ self.reflection_pad = torch.nn.ReflectionPad1d((1, 0))
+ self.stft = TorchSTFT(filter_length=gen_istft_n_fft, hop_length=gen_istft_hop_size, win_length=gen_istft_n_fft)
+
+
+ def forward(self, x, s, f0):
+ with torch.no_grad():
+ f0 = self.f0_upsamp(f0[:, None]).transpose(1, 2) # bs,n,t
+
+ har_source, noi_source, uv = self.m_source(f0)
+ har_source = har_source.transpose(1, 2).squeeze(1)
+ har_spec, har_phase = self.stft.transform(har_source)
+ har = torch.cat([har_spec, har_phase], dim=1)
+
+ for i in range(self.num_upsamples):
+ x = F.leaky_relu(x, LRELU_SLOPE)
+ x_source = self.noise_convs[i](har)
+ x_source = self.noise_res[i](x_source, s)
+
+ x = self.ups[i](x)
+ if i == self.num_upsamples - 1:
+ x = self.reflection_pad(x)
+
+ x = x + x_source
+ xs = None
+ for j in range(self.num_kernels):
+ if xs is None:
+ xs = self.resblocks[i*self.num_kernels+j](x, s)
+ else:
+ xs += self.resblocks[i*self.num_kernels+j](x, s)
+ x = xs / self.num_kernels
+ x = F.leaky_relu(x)
+ x = self.conv_post(x)
+ spec = torch.exp(x[:,:self.post_n_fft // 2 + 1, :])
+ phase = torch.sin(x[:, self.post_n_fft // 2 + 1:, :])
+ return self.stft.inverse(spec, phase)
+
+ def fw_phase(self, x, s):
+ for i in range(self.num_upsamples):
+ x = F.leaky_relu(x, LRELU_SLOPE)
+ x = self.ups[i](x)
+ xs = None
+ for j in range(self.num_kernels):
+ if xs is None:
+ xs = self.resblocks[i*self.num_kernels+j](x, s)
+ else:
+ xs += self.resblocks[i*self.num_kernels+j](x, s)
+ x = xs / self.num_kernels
+ x = F.leaky_relu(x)
+ x = self.reflection_pad(x)
+ x = self.conv_post(x)
+ spec = torch.exp(x[:,:self.post_n_fft // 2 + 1, :])
+ phase = torch.sin(x[:, self.post_n_fft // 2 + 1:, :])
+ return spec, phase
+
+ def remove_weight_norm(self):
+ print('Removing weight norm...')
+ for l in self.ups:
+ remove_weight_norm(l)
+ for l in self.resblocks:
+ l.remove_weight_norm()
+ remove_weight_norm(self.conv_pre)
+ remove_weight_norm(self.conv_post)
+
+
+class AdainResBlk1d(nn.Module):
+ def __init__(self, dim_in, dim_out, style_dim=64, actv=nn.LeakyReLU(0.2),
+ upsample='none', dropout_p=0.0):
+ super().__init__()
+ self.actv = actv
+ self.upsample_type = upsample
+ self.upsample = UpSample1d(upsample)
+ self.learned_sc = dim_in != dim_out
+ self._build_weights(dim_in, dim_out, style_dim)
+ self.dropout = nn.Dropout(dropout_p)
+
+ if upsample == 'none':
+ self.pool = nn.Identity()
+ else:
+ self.pool = weight_norm(nn.ConvTranspose1d(dim_in, dim_in, kernel_size=3, stride=2, groups=dim_in, padding=1, output_padding=1))
+
+
+ def _build_weights(self, dim_in, dim_out, style_dim):
+ self.conv1 = weight_norm(nn.Conv1d(dim_in, dim_out, 3, 1, 1))
+ self.conv2 = weight_norm(nn.Conv1d(dim_out, dim_out, 3, 1, 1))
+ self.norm1 = AdaIN1d(style_dim, dim_in)
+ self.norm2 = AdaIN1d(style_dim, dim_out)
+ if self.learned_sc:
+ self.conv1x1 = weight_norm(nn.Conv1d(dim_in, dim_out, 1, 1, 0, bias=False))
+
+ def _shortcut(self, x):
+ x = self.upsample(x)
+ if self.learned_sc:
+ x = self.conv1x1(x)
+ return x
+
+ def _residual(self, x, s):
+ x = self.norm1(x, s)
+ x = self.actv(x)
+ x = self.pool(x)
+ x = self.conv1(self.dropout(x))
+ x = self.norm2(x, s)
+ x = self.actv(x)
+ x = self.conv2(self.dropout(x))
+ return x
+
+ def forward(self, x, s):
+ out = self._residual(x, s)
+ out = (out + self._shortcut(x)) / np.sqrt(2)
+ return out
+
+class UpSample1d(nn.Module):
+ def __init__(self, layer_type):
+ super().__init__()
+ self.layer_type = layer_type
+
+ def forward(self, x):
+ if self.layer_type == 'none':
+ return x
+ else:
+ return F.interpolate(x, scale_factor=2, mode='nearest')
+
+class Decoder(nn.Module):
+ def __init__(self, dim_in=512, F0_channel=512, style_dim=64, dim_out=80,
+ resblock_kernel_sizes = [3,7,11],
+ upsample_rates = [10, 6],
+ upsample_initial_channel=512,
+ resblock_dilation_sizes=[[1,3,5], [1,3,5], [1,3,5]],
+ upsample_kernel_sizes=[20, 12],
+ gen_istft_n_fft=20, gen_istft_hop_size=5):
+ super().__init__()
+
+ self.decode = nn.ModuleList()
+
+ self.encode = AdainResBlk1d(dim_in + 2, 1024, style_dim)
+
+ self.decode.append(AdainResBlk1d(1024 + 2 + 64, 1024, style_dim))
+ self.decode.append(AdainResBlk1d(1024 + 2 + 64, 1024, style_dim))
+ self.decode.append(AdainResBlk1d(1024 + 2 + 64, 1024, style_dim))
+ self.decode.append(AdainResBlk1d(1024 + 2 + 64, 512, style_dim, upsample=True))
+
+ self.F0_conv = weight_norm(nn.Conv1d(1, 1, kernel_size=3, stride=2, groups=1, padding=1))
+
+ self.N_conv = weight_norm(nn.Conv1d(1, 1, kernel_size=3, stride=2, groups=1, padding=1))
+
+ self.asr_res = nn.Sequential(
+ weight_norm(nn.Conv1d(512, 64, kernel_size=1)),
+ )
+
+
+ self.generator = Generator(style_dim, resblock_kernel_sizes, upsample_rates,
+ upsample_initial_channel, resblock_dilation_sizes,
+ upsample_kernel_sizes, gen_istft_n_fft, gen_istft_hop_size)
+
+ def forward(self, asr, F0_curve, N, s):
+ F0 = self.F0_conv(F0_curve.unsqueeze(1))
+ N = self.N_conv(N.unsqueeze(1))
+
+ x = torch.cat([asr, F0, N], axis=1)
+ x = self.encode(x, s)
+
+ asr_res = self.asr_res(asr)
+
+ res = True
+ for block in self.decode:
+ if res:
+ x = torch.cat([x, asr_res, F0, N], axis=1)
+ x = block(x, s)
+ if block.upsample_type != "none":
+ res = False
+
+ x = self.generator(x, s, F0_curve)
+ return x
diff --git a/backend/python/kokoro/kokoro.py b/backend/python/kokoro/kokoro.py
new file mode 100644
index 00000000..3a0df7f5
--- /dev/null
+++ b/backend/python/kokoro/kokoro.py
@@ -0,0 +1,166 @@
+# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/kokoro.py
+import phonemizer
+import re
+import torch
+import numpy as np
+
+def split_num(num):
+ num = num.group()
+ if '.' in num:
+ return num
+ elif ':' in num:
+ h, m = [int(n) for n in num.split(':')]
+ if m == 0:
+ return f"{h} o'clock"
+ elif m < 10:
+ return f'{h} oh {m}'
+ return f'{h} {m}'
+ year = int(num[:4])
+ if year < 1100 or year % 1000 < 10:
+ return num
+ left, right = num[:2], int(num[2:4])
+ s = 's' if num.endswith('s') else ''
+ if 100 <= year % 1000 <= 999:
+ if right == 0:
+ return f'{left} hundred{s}'
+ elif right < 10:
+ return f'{left} oh {right}{s}'
+ return f'{left} {right}{s}'
+
+def flip_money(m):
+ m = m.group()
+ bill = 'dollar' if m[0] == '$' else 'pound'
+ if m[-1].isalpha():
+ return f'{m[1:]} {bill}s'
+ elif '.' not in m:
+ s = '' if m[1:] == '1' else 's'
+ return f'{m[1:]} {bill}{s}'
+ b, c = m[1:].split('.')
+ s = '' if b == '1' else 's'
+ c = int(c.ljust(2, '0'))
+ coins = f"cent{'' if c == 1 else 's'}" if m[0] == '$' else ('penny' if c == 1 else 'pence')
+ return f'{b} {bill}{s} and {c} {coins}'
+
+def point_num(num):
+ a, b = num.group().split('.')
+ return ' point '.join([a, ' '.join(b)])
+
+def normalize_text(text):
+ text = text.replace(chr(8216), "'").replace(chr(8217), "'")
+ text = text.replace('Ā«', chr(8220)).replace('Ā»', chr(8221))
+ text = text.replace(chr(8220), '"').replace(chr(8221), '"')
+ text = text.replace('(', 'Ā«').replace(')', 'Ā»')
+ for a, b in zip('ććļ¼ļ¼ļ¼ļ¼ļ¼', ',.!,:;?'):
+ text = text.replace(a, b+' ')
+ text = re.sub(r'[^\S \n]', ' ', text)
+ text = re.sub(r' +', ' ', text)
+ text = re.sub(r'(?<=\n) +(?=\n)', '', text)
+ text = re.sub(r'\bD[Rr]\.(?= [A-Z])', 'Doctor', text)
+ text = re.sub(r'\b(?:Mr\.|MR\.(?= [A-Z]))', 'Mister', text)
+ text = re.sub(r'\b(?:Ms\.|MS\.(?= [A-Z]))', 'Miss', text)
+ text = re.sub(r'\b(?:Mrs\.|MRS\.(?= [A-Z]))', 'Mrs', text)
+ text = re.sub(r'\betc\.(?! [A-Z])', 'etc', text)
+ text = re.sub(r'(?i)\b(y)eah?\b', r"\1e'a", text)
+ text = re.sub(r'\d*\.\d+|\b\d{4}s?\b|(? 510:
+ tokens = tokens[:510]
+ print('Truncated to 510 tokens')
+ ref_s = voicepack[len(tokens)]
+ out = forward(model, tokens, ref_s, speed)
+ ps = ''.join(next(k for k, v in VOCAB.items() if i == v) for i in tokens)
+ return out, ps
+
+def generate_full(model, text, voicepack, lang='a', speed=1, ps=None):
+ ps = ps or phonemize(text, lang)
+ tokens = tokenize(ps)
+ if not tokens:
+ return None
+ outs = []
+ loop_count = len(tokens)//510 + (1 if len(tokens) % 510 != 0 else 0)
+ for i in range(loop_count):
+ ref_s = voicepack[len(tokens[i*510:(i+1)*510])]
+ out = forward(model, tokens[i*510:(i+1)*510], ref_s, speed)
+ outs.append(out)
+ outs = np.concatenate(outs)
+ ps = ''.join(next(k for k, v in VOCAB.items() if i == v) for i in tokens)
+ return outs, ps
\ No newline at end of file
diff --git a/backend/python/kokoro/models.py b/backend/python/kokoro/models.py
new file mode 100644
index 00000000..cf358d9e
--- /dev/null
+++ b/backend/python/kokoro/models.py
@@ -0,0 +1,373 @@
+# https://github.com/yl4579/StyleTTS2/blob/main/models.py
+# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/models.py
+from istftnet import AdaIN1d, Decoder
+from munch import Munch
+from pathlib import Path
+from plbert import load_plbert
+from torch.nn.utils import weight_norm, spectral_norm
+import json
+import numpy as np
+import os
+import os.path as osp
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+class LinearNorm(torch.nn.Module):
+ def __init__(self, in_dim, out_dim, bias=True, w_init_gain='linear'):
+ super(LinearNorm, self).__init__()
+ self.linear_layer = torch.nn.Linear(in_dim, out_dim, bias=bias)
+
+ torch.nn.init.xavier_uniform_(
+ self.linear_layer.weight,
+ gain=torch.nn.init.calculate_gain(w_init_gain))
+
+ def forward(self, x):
+ return self.linear_layer(x)
+
+class LayerNorm(nn.Module):
+ def __init__(self, channels, eps=1e-5):
+ super().__init__()
+ self.channels = channels
+ self.eps = eps
+
+ self.gamma = nn.Parameter(torch.ones(channels))
+ self.beta = nn.Parameter(torch.zeros(channels))
+
+ def forward(self, x):
+ x = x.transpose(1, -1)
+ x = F.layer_norm(x, (self.channels,), self.gamma, self.beta, self.eps)
+ return x.transpose(1, -1)
+
+class TextEncoder(nn.Module):
+ def __init__(self, channels, kernel_size, depth, n_symbols, actv=nn.LeakyReLU(0.2)):
+ super().__init__()
+ self.embedding = nn.Embedding(n_symbols, channels)
+
+ padding = (kernel_size - 1) // 2
+ self.cnn = nn.ModuleList()
+ for _ in range(depth):
+ self.cnn.append(nn.Sequential(
+ weight_norm(nn.Conv1d(channels, channels, kernel_size=kernel_size, padding=padding)),
+ LayerNorm(channels),
+ actv,
+ nn.Dropout(0.2),
+ ))
+ # self.cnn = nn.Sequential(*self.cnn)
+
+ self.lstm = nn.LSTM(channels, channels//2, 1, batch_first=True, bidirectional=True)
+
+ def forward(self, x, input_lengths, m):
+ x = self.embedding(x) # [B, T, emb]
+ x = x.transpose(1, 2) # [B, emb, T]
+ m = m.to(input_lengths.device).unsqueeze(1)
+ x.masked_fill_(m, 0.0)
+
+ for c in self.cnn:
+ x = c(x)
+ x.masked_fill_(m, 0.0)
+
+ x = x.transpose(1, 2) # [B, T, chn]
+
+ input_lengths = input_lengths.cpu().numpy()
+ x = nn.utils.rnn.pack_padded_sequence(
+ x, input_lengths, batch_first=True, enforce_sorted=False)
+
+ self.lstm.flatten_parameters()
+ x, _ = self.lstm(x)
+ x, _ = nn.utils.rnn.pad_packed_sequence(
+ x, batch_first=True)
+
+ x = x.transpose(-1, -2)
+ x_pad = torch.zeros([x.shape[0], x.shape[1], m.shape[-1]])
+
+ x_pad[:, :, :x.shape[-1]] = x
+ x = x_pad.to(x.device)
+
+ x.masked_fill_(m, 0.0)
+
+ return x
+
+ def inference(self, x):
+ x = self.embedding(x)
+ x = x.transpose(1, 2)
+ x = self.cnn(x)
+ x = x.transpose(1, 2)
+ self.lstm.flatten_parameters()
+ x, _ = self.lstm(x)
+ return x
+
+ def length_to_mask(self, lengths):
+ mask = torch.arange(lengths.max()).unsqueeze(0).expand(lengths.shape[0], -1).type_as(lengths)
+ mask = torch.gt(mask+1, lengths.unsqueeze(1))
+ return mask
+
+
+class UpSample1d(nn.Module):
+ def __init__(self, layer_type):
+ super().__init__()
+ self.layer_type = layer_type
+
+ def forward(self, x):
+ if self.layer_type == 'none':
+ return x
+ else:
+ return F.interpolate(x, scale_factor=2, mode='nearest')
+
+class AdainResBlk1d(nn.Module):
+ def __init__(self, dim_in, dim_out, style_dim=64, actv=nn.LeakyReLU(0.2),
+ upsample='none', dropout_p=0.0):
+ super().__init__()
+ self.actv = actv
+ self.upsample_type = upsample
+ self.upsample = UpSample1d(upsample)
+ self.learned_sc = dim_in != dim_out
+ self._build_weights(dim_in, dim_out, style_dim)
+ self.dropout = nn.Dropout(dropout_p)
+
+ if upsample == 'none':
+ self.pool = nn.Identity()
+ else:
+ self.pool = weight_norm(nn.ConvTranspose1d(dim_in, dim_in, kernel_size=3, stride=2, groups=dim_in, padding=1, output_padding=1))
+
+
+ def _build_weights(self, dim_in, dim_out, style_dim):
+ self.conv1 = weight_norm(nn.Conv1d(dim_in, dim_out, 3, 1, 1))
+ self.conv2 = weight_norm(nn.Conv1d(dim_out, dim_out, 3, 1, 1))
+ self.norm1 = AdaIN1d(style_dim, dim_in)
+ self.norm2 = AdaIN1d(style_dim, dim_out)
+ if self.learned_sc:
+ self.conv1x1 = weight_norm(nn.Conv1d(dim_in, dim_out, 1, 1, 0, bias=False))
+
+ def _shortcut(self, x):
+ x = self.upsample(x)
+ if self.learned_sc:
+ x = self.conv1x1(x)
+ return x
+
+ def _residual(self, x, s):
+ x = self.norm1(x, s)
+ x = self.actv(x)
+ x = self.pool(x)
+ x = self.conv1(self.dropout(x))
+ x = self.norm2(x, s)
+ x = self.actv(x)
+ x = self.conv2(self.dropout(x))
+ return x
+
+ def forward(self, x, s):
+ out = self._residual(x, s)
+ out = (out + self._shortcut(x)) / np.sqrt(2)
+ return out
+
+class AdaLayerNorm(nn.Module):
+ def __init__(self, style_dim, channels, eps=1e-5):
+ super().__init__()
+ self.channels = channels
+ self.eps = eps
+
+ self.fc = nn.Linear(style_dim, channels*2)
+
+ def forward(self, x, s):
+ x = x.transpose(-1, -2)
+ x = x.transpose(1, -1)
+
+ h = self.fc(s)
+ h = h.view(h.size(0), h.size(1), 1)
+ gamma, beta = torch.chunk(h, chunks=2, dim=1)
+ gamma, beta = gamma.transpose(1, -1), beta.transpose(1, -1)
+
+
+ x = F.layer_norm(x, (self.channels,), eps=self.eps)
+ x = (1 + gamma) * x + beta
+ return x.transpose(1, -1).transpose(-1, -2)
+
+class ProsodyPredictor(nn.Module):
+
+ def __init__(self, style_dim, d_hid, nlayers, max_dur=50, dropout=0.1):
+ super().__init__()
+
+ self.text_encoder = DurationEncoder(sty_dim=style_dim,
+ d_model=d_hid,
+ nlayers=nlayers,
+ dropout=dropout)
+
+ self.lstm = nn.LSTM(d_hid + style_dim, d_hid // 2, 1, batch_first=True, bidirectional=True)
+ self.duration_proj = LinearNorm(d_hid, max_dur)
+
+ self.shared = nn.LSTM(d_hid + style_dim, d_hid // 2, 1, batch_first=True, bidirectional=True)
+ self.F0 = nn.ModuleList()
+ self.F0.append(AdainResBlk1d(d_hid, d_hid, style_dim, dropout_p=dropout))
+ self.F0.append(AdainResBlk1d(d_hid, d_hid // 2, style_dim, upsample=True, dropout_p=dropout))
+ self.F0.append(AdainResBlk1d(d_hid // 2, d_hid // 2, style_dim, dropout_p=dropout))
+
+ self.N = nn.ModuleList()
+ self.N.append(AdainResBlk1d(d_hid, d_hid, style_dim, dropout_p=dropout))
+ self.N.append(AdainResBlk1d(d_hid, d_hid // 2, style_dim, upsample=True, dropout_p=dropout))
+ self.N.append(AdainResBlk1d(d_hid // 2, d_hid // 2, style_dim, dropout_p=dropout))
+
+ self.F0_proj = nn.Conv1d(d_hid // 2, 1, 1, 1, 0)
+ self.N_proj = nn.Conv1d(d_hid // 2, 1, 1, 1, 0)
+
+
+ def forward(self, texts, style, text_lengths, alignment, m):
+ d = self.text_encoder(texts, style, text_lengths, m)
+
+ batch_size = d.shape[0]
+ text_size = d.shape[1]
+
+ # predict duration
+ input_lengths = text_lengths.cpu().numpy()
+ x = nn.utils.rnn.pack_padded_sequence(
+ d, input_lengths, batch_first=True, enforce_sorted=False)
+
+ m = m.to(text_lengths.device).unsqueeze(1)
+
+ self.lstm.flatten_parameters()
+ x, _ = self.lstm(x)
+ x, _ = nn.utils.rnn.pad_packed_sequence(
+ x, batch_first=True)
+
+ x_pad = torch.zeros([x.shape[0], m.shape[-1], x.shape[-1]])
+
+ x_pad[:, :x.shape[1], :] = x
+ x = x_pad.to(x.device)
+
+ duration = self.duration_proj(nn.functional.dropout(x, 0.5, training=self.training))
+
+ en = (d.transpose(-1, -2) @ alignment)
+
+ return duration.squeeze(-1), en
+
+ def F0Ntrain(self, x, s):
+ x, _ = self.shared(x.transpose(-1, -2))
+
+ F0 = x.transpose(-1, -2)
+ for block in self.F0:
+ F0 = block(F0, s)
+ F0 = self.F0_proj(F0)
+
+ N = x.transpose(-1, -2)
+ for block in self.N:
+ N = block(N, s)
+ N = self.N_proj(N)
+
+ return F0.squeeze(1), N.squeeze(1)
+
+ def length_to_mask(self, lengths):
+ mask = torch.arange(lengths.max()).unsqueeze(0).expand(lengths.shape[0], -1).type_as(lengths)
+ mask = torch.gt(mask+1, lengths.unsqueeze(1))
+ return mask
+
+class DurationEncoder(nn.Module):
+
+ def __init__(self, sty_dim, d_model, nlayers, dropout=0.1):
+ super().__init__()
+ self.lstms = nn.ModuleList()
+ for _ in range(nlayers):
+ self.lstms.append(nn.LSTM(d_model + sty_dim,
+ d_model // 2,
+ num_layers=1,
+ batch_first=True,
+ bidirectional=True,
+ dropout=dropout))
+ self.lstms.append(AdaLayerNorm(sty_dim, d_model))
+
+
+ self.dropout = dropout
+ self.d_model = d_model
+ self.sty_dim = sty_dim
+
+ def forward(self, x, style, text_lengths, m):
+ masks = m.to(text_lengths.device)
+
+ x = x.permute(2, 0, 1)
+ s = style.expand(x.shape[0], x.shape[1], -1)
+ x = torch.cat([x, s], axis=-1)
+ x.masked_fill_(masks.unsqueeze(-1).transpose(0, 1), 0.0)
+
+ x = x.transpose(0, 1)
+ input_lengths = text_lengths.cpu().numpy()
+ x = x.transpose(-1, -2)
+
+ for block in self.lstms:
+ if isinstance(block, AdaLayerNorm):
+ x = block(x.transpose(-1, -2), style).transpose(-1, -2)
+ x = torch.cat([x, s.permute(1, -1, 0)], axis=1)
+ x.masked_fill_(masks.unsqueeze(-1).transpose(-1, -2), 0.0)
+ else:
+ x = x.transpose(-1, -2)
+ x = nn.utils.rnn.pack_padded_sequence(
+ x, input_lengths, batch_first=True, enforce_sorted=False)
+ block.flatten_parameters()
+ x, _ = block(x)
+ x, _ = nn.utils.rnn.pad_packed_sequence(
+ x, batch_first=True)
+ x = F.dropout(x, p=self.dropout, training=self.training)
+ x = x.transpose(-1, -2)
+
+ x_pad = torch.zeros([x.shape[0], x.shape[1], m.shape[-1]])
+
+ x_pad[:, :, :x.shape[-1]] = x
+ x = x_pad.to(x.device)
+
+ return x.transpose(-1, -2)
+
+ def inference(self, x, style):
+ x = self.embedding(x.transpose(-1, -2)) * np.sqrt(self.d_model)
+ style = style.expand(x.shape[0], x.shape[1], -1)
+ x = torch.cat([x, style], axis=-1)
+ src = self.pos_encoder(x)
+ output = self.transformer_encoder(src).transpose(0, 1)
+ return output
+
+ def length_to_mask(self, lengths):
+ mask = torch.arange(lengths.max()).unsqueeze(0).expand(lengths.shape[0], -1).type_as(lengths)
+ mask = torch.gt(mask+1, lengths.unsqueeze(1))
+ return mask
+
+# https://github.com/yl4579/StyleTTS2/blob/main/utils.py
+def recursive_munch(d):
+ if isinstance(d, dict):
+ return Munch((k, recursive_munch(v)) for k, v in d.items())
+ elif isinstance(d, list):
+ return [recursive_munch(v) for v in d]
+ else:
+ return d
+
+def build_model(path, device):
+ config = Path(__file__).parent / 'config.json'
+ assert config.exists(), f'Config path incorrect: config.json not found at {config}'
+ with open(config, 'r') as r:
+ args = recursive_munch(json.load(r))
+ assert args.decoder.type == 'istftnet', f'Unknown decoder type: {args.decoder.type}'
+ decoder = Decoder(dim_in=args.hidden_dim, style_dim=args.style_dim, dim_out=args.n_mels,
+ resblock_kernel_sizes = args.decoder.resblock_kernel_sizes,
+ upsample_rates = args.decoder.upsample_rates,
+ upsample_initial_channel=args.decoder.upsample_initial_channel,
+ resblock_dilation_sizes=args.decoder.resblock_dilation_sizes,
+ upsample_kernel_sizes=args.decoder.upsample_kernel_sizes,
+ gen_istft_n_fft=args.decoder.gen_istft_n_fft, gen_istft_hop_size=args.decoder.gen_istft_hop_size)
+ text_encoder = TextEncoder(channels=args.hidden_dim, kernel_size=5, depth=args.n_layer, n_symbols=args.n_token)
+ predictor = ProsodyPredictor(style_dim=args.style_dim, d_hid=args.hidden_dim, nlayers=args.n_layer, max_dur=args.max_dur, dropout=args.dropout)
+ bert = load_plbert()
+ bert_encoder = nn.Linear(bert.config.hidden_size, args.hidden_dim)
+ for parent in [bert, bert_encoder, predictor, decoder, text_encoder]:
+ for child in parent.children():
+ if isinstance(child, nn.RNNBase):
+ child.flatten_parameters()
+ model = Munch(
+ bert=bert.to(device).eval(),
+ bert_encoder=bert_encoder.to(device).eval(),
+ predictor=predictor.to(device).eval(),
+ decoder=decoder.to(device).eval(),
+ text_encoder=text_encoder.to(device).eval(),
+ )
+ for key, state_dict in torch.load(path, map_location='cpu', weights_only=True)['net'].items():
+ assert key in model, key
+ try:
+ model[key].load_state_dict(state_dict)
+ except:
+ state_dict = {k[7:]: v for k, v in state_dict.items()}
+ model[key].load_state_dict(state_dict, strict=False)
+ return model
diff --git a/backend/python/kokoro/plbert.py b/backend/python/kokoro/plbert.py
new file mode 100644
index 00000000..bf1dba5a
--- /dev/null
+++ b/backend/python/kokoro/plbert.py
@@ -0,0 +1,16 @@
+# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/plbert.py
+# https://github.com/yl4579/StyleTTS2/blob/main/Utils/PLBERT/util.py
+from transformers import AlbertConfig, AlbertModel
+
+class CustomAlbert(AlbertModel):
+ def forward(self, *args, **kwargs):
+ # Call the original forward method
+ outputs = super().forward(*args, **kwargs)
+ # Only return the last_hidden_state
+ return outputs.last_hidden_state
+
+def load_plbert():
+ plbert_config = {'vocab_size': 178, 'hidden_size': 768, 'num_attention_heads': 12, 'intermediate_size': 2048, 'max_position_embeddings': 512, 'num_hidden_layers': 12, 'dropout': 0.1}
+ albert_base_configuration = AlbertConfig(**plbert_config)
+ bert = CustomAlbert(albert_base_configuration)
+ return bert
diff --git a/backend/python/kokoro/protogen.sh b/backend/python/kokoro/protogen.sh
new file mode 100644
index 00000000..32f39fbb
--- /dev/null
+++ b/backend/python/kokoro/protogen.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+set -e
+
+source $(dirname $0)/../common/libbackend.sh
+
+python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
\ No newline at end of file
diff --git a/backend/python/kokoro/requirements-cpu.txt b/backend/python/kokoro/requirements-cpu.txt
new file mode 100644
index 00000000..b4f1261f
--- /dev/null
+++ b/backend/python/kokoro/requirements-cpu.txt
@@ -0,0 +1,2 @@
+torch==2.4.1
+transformers
\ No newline at end of file
diff --git a/backend/python/kokoro/requirements-cublas11.txt b/backend/python/kokoro/requirements-cublas11.txt
new file mode 100644
index 00000000..ed0d4df5
--- /dev/null
+++ b/backend/python/kokoro/requirements-cublas11.txt
@@ -0,0 +1,3 @@
+--extra-index-url https://download.pytorch.org/whl/cu118
+torch==2.4.1+cu118
+transformers
\ No newline at end of file
diff --git a/backend/python/kokoro/requirements-cublas12.txt b/backend/python/kokoro/requirements-cublas12.txt
new file mode 100644
index 00000000..b4f1261f
--- /dev/null
+++ b/backend/python/kokoro/requirements-cublas12.txt
@@ -0,0 +1,2 @@
+torch==2.4.1
+transformers
\ No newline at end of file
diff --git a/backend/python/kokoro/requirements-hipblas.txt b/backend/python/kokoro/requirements-hipblas.txt
new file mode 100644
index 00000000..ec8d0306
--- /dev/null
+++ b/backend/python/kokoro/requirements-hipblas.txt
@@ -0,0 +1,3 @@
+--extra-index-url https://download.pytorch.org/whl/rocm6.0
+torch==2.4.1+rocm6.0
+transformers
\ No newline at end of file
diff --git a/backend/python/kokoro/requirements-intel.txt b/backend/python/kokoro/requirements-intel.txt
new file mode 100644
index 00000000..b16448d3
--- /dev/null
+++ b/backend/python/kokoro/requirements-intel.txt
@@ -0,0 +1,5 @@
+--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+intel-extension-for-pytorch==2.3.110+xpu
+torch==2.3.1+cxx11.abi
+oneccl_bind_pt==2.3.100+xpu
+transformers
\ No newline at end of file
diff --git a/backend/python/kokoro/requirements.txt b/backend/python/kokoro/requirements.txt
new file mode 100644
index 00000000..75d65ba1
--- /dev/null
+++ b/backend/python/kokoro/requirements.txt
@@ -0,0 +1,7 @@
+grpcio==1.69.0
+protobuf
+phonemizer
+scipy
+munch
+setuptools
+soundfile
\ No newline at end of file
diff --git a/backend/python/kokoro/run.sh b/backend/python/kokoro/run.sh
new file mode 100755
index 00000000..375c07e5
--- /dev/null
+++ b/backend/python/kokoro/run.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+source $(dirname $0)/../common/libbackend.sh
+
+startBackend $@
\ No newline at end of file
diff --git a/backend/python/kokoro/test.sh b/backend/python/kokoro/test.sh
new file mode 100755
index 00000000..6940b066
--- /dev/null
+++ b/backend/python/kokoro/test.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+set -e
+
+source $(dirname $0)/../common/libbackend.sh
+
+runUnittests
diff --git a/pkg/model/loader.go b/pkg/model/loader.go
index d62f52b2..bb9bdd8a 100644
--- a/pkg/model/loader.go
+++ b/pkg/model/loader.go
@@ -54,6 +54,8 @@ var knownModelsNameSuffixToSkip []string = []string{
".yml",
".json",
".txt",
+ ".pt",
+ ".onnx",
".md",
".MD",
".DS_Store",
From d08d97bebf9fd44010f5a38b3f7002edb29f2793 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 16 Jan 2025 22:26:55 +0100
Subject: [PATCH 035/679] chore(model gallery): fix typo
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 22d748d8..349cd419 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3281,7 +3281,7 @@
- filename: smollm2-1.7b-instruct-q4_k_m.gguf
sha256: decd2598bc2c8ed08c19adc3c8fdd461ee19ed5708679d1c54ef54a5a30d4f33
uri: huggingface://HuggingFaceTB/SmolLM2-1.7B-Instruct-GGUF/smollm2-1.7b-instruct-q4_k_m.gguf
-- !!merge <<: qwen25
+- !!merge <<: *qwen25
name: "vikhr-qwen-2.5-1.5b-instruct"
urls:
- https://huggingface.co/Vikhrmodels/Vikhr-Qwen-2.5-1.5B-Instruct
From 7d0ac1ea3f5faf8047623f5cb92df23bdbd1f393 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 17 Jan 2025 09:35:10 +0100
Subject: [PATCH 036/679] chore(vall-e-x): Drop backend (#4619)
There are many new architectures that are SOTA and replaces vall-e-x
nowadays.
Signed-off-by: Ettore Di Giacinto
---
.github/dependabot.yml | 4 -
.github/workflows/test-extra.yml | 20 ---
Dockerfile | 7 +-
Makefile | 13 +-
backend/python/vall-e-x/.gitignore | 1 -
backend/python/vall-e-x/Makefile | 33 ----
backend/python/vall-e-x/README.md | 5 -
backend/python/vall-e-x/backend.py | 141 ------------------
backend/python/vall-e-x/install.sh | 22 ---
backend/python/vall-e-x/requirements-cpu.txt | 3 -
.../python/vall-e-x/requirements-cublas11.txt | 4 -
.../python/vall-e-x/requirements-cublas12.txt | 3 -
.../python/vall-e-x/requirements-hipblas.txt | 4 -
.../python/vall-e-x/requirements-intel.txt | 7 -
backend/python/vall-e-x/requirements.txt | 4 -
backend/python/vall-e-x/run.sh | 6 -
backend/python/vall-e-x/test.py | 81 ----------
backend/python/vall-e-x/test.sh | 7 -
core/backend/options.go | 2 +-
core/config/backend_config.go | 7 +-
20 files changed, 6 insertions(+), 368 deletions(-)
delete mode 100644 backend/python/vall-e-x/.gitignore
delete mode 100644 backend/python/vall-e-x/Makefile
delete mode 100644 backend/python/vall-e-x/README.md
delete mode 100644 backend/python/vall-e-x/backend.py
delete mode 100755 backend/python/vall-e-x/install.sh
delete mode 100644 backend/python/vall-e-x/requirements-cpu.txt
delete mode 100644 backend/python/vall-e-x/requirements-cublas11.txt
delete mode 100644 backend/python/vall-e-x/requirements-cublas12.txt
delete mode 100644 backend/python/vall-e-x/requirements-hipblas.txt
delete mode 100644 backend/python/vall-e-x/requirements-intel.txt
delete mode 100644 backend/python/vall-e-x/requirements.txt
delete mode 100755 backend/python/vall-e-x/run.sh
delete mode 100644 backend/python/vall-e-x/test.py
delete mode 100755 backend/python/vall-e-x/test.sh
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
index fcd6c88c..8fa0cca5 100644
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@@ -85,10 +85,6 @@ updates:
directory: "/backend/python/transformers-musicgen"
schedule:
interval: "weekly"
- - package-ecosystem: "pip"
- directory: "/backend/python/vall-e-x"
- schedule:
- interval: "weekly"
- package-ecosystem: "pip"
directory: "/backend/python/vllm"
schedule:
diff --git a/.github/workflows/test-extra.yml b/.github/workflows/test-extra.yml
index a2c34872..3c2fee37 100644
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -260,26 +260,6 @@ jobs:
# run: |
# make --jobs=5 --output-sync=target -C backend/python/vllm
# make --jobs=5 --output-sync=target -C backend/python/vllm test
- tests-vallex:
- runs-on: ubuntu-latest
- steps:
- - name: Clone
- uses: actions/checkout@v4
- with:
- submodules: true
- - name: Dependencies
- run: |
- sudo apt-get update
- sudo apt-get install build-essential ffmpeg
- # Install UV
- curl -LsSf https://astral.sh/uv/install.sh | sh
- sudo apt-get install -y ca-certificates cmake curl patch python3-pip
- sudo apt-get install -y libopencv-dev
- pip install --user --no-cache-dir grpcio-tools==1.64.1
- - name: Test vall-e-x
- run: |
- make --jobs=5 --output-sync=target -C backend/python/vall-e-x
- make --jobs=5 --output-sync=target -C backend/python/vall-e-x test
tests-coqui:
runs-on: ubuntu-latest
diff --git a/Dockerfile b/Dockerfile
index 481edf90..354ef298 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vall-e-x:/build/backend/python/vall-e-x/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
RUN apt-get update && \
@@ -453,10 +453,7 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAG
make -C backend/python/transformers-musicgen \
; fi
-RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vall-e-x" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
- make -C backend/python/vall-e-x \
- ; fi && \
- if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
+RUN if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/kokoro \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "openvoice" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
diff --git a/Makefile b/Makefile
index 49c81950..1983f568 100644
--- a/Makefile
+++ b/Makefile
@@ -583,10 +583,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
-protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen transformers-musicgen-protogen vall-e-x-protogen kokoro-protogen vllm-protogen openvoice-protogen
+protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen transformers-musicgen-protogen kokoro-protogen vllm-protogen openvoice-protogen
.PHONY: protogen-python-clean
-protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean transformers-musicgen-protogen-clean parler-tts-protogen-clean vall-e-x-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean
+protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean transformers-musicgen-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -676,14 +676,6 @@ transformers-musicgen-protogen:
transformers-musicgen-protogen-clean:
$(MAKE) -C backend/python/transformers-musicgen protogen-clean
-.PHONY: vall-e-x-protogen
-vall-e-x-protogen:
- $(MAKE) -C backend/python/vall-e-x protogen
-
-.PHONY: vall-e-x-protogen-clean
-vall-e-x-protogen-clean:
- $(MAKE) -C backend/python/vall-e-x protogen-clean
-
.PHONY: kokoro-protogen
kokoro-protogen:
$(MAKE) -C backend/python/kokoro protogen
@@ -722,7 +714,6 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/transformers
$(MAKE) -C backend/python/transformers-musicgen
$(MAKE) -C backend/python/parler-tts
- $(MAKE) -C backend/python/vall-e-x
$(MAKE) -C backend/python/kokoro
$(MAKE) -C backend/python/openvoice
$(MAKE) -C backend/python/exllama2
diff --git a/backend/python/vall-e-x/.gitignore b/backend/python/vall-e-x/.gitignore
deleted file mode 100644
index 1d3a0654..00000000
--- a/backend/python/vall-e-x/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-source
\ No newline at end of file
diff --git a/backend/python/vall-e-x/Makefile b/backend/python/vall-e-x/Makefile
deleted file mode 100644
index a3ca32a3..00000000
--- a/backend/python/vall-e-x/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-ifneq (,$(findstring sycl,$(BUILD_TYPE)))
-export SKIP_CONDA=1
-endif
-
-.PHONY: ttsvalle
-ttsvalle: protogen
- bash install.sh
-
-.PHONY: run
-run: protogen
- @echo "Running ttsvalle..."
- bash run.sh
- @echo "ttsvalle run."
-
-.PHONY: test
-test: protogen
- @echo "Testing valle..."
- bash test.sh
- @echo "valle tested."
-
-.PHONY: protogen
-protogen: backend_pb2_grpc.py backend_pb2.py
-
-.PHONY: protogen-clean
-protogen-clean:
- $(RM) backend_pb2_grpc.py backend_pb2.py
-
-backend_pb2_grpc.py backend_pb2.py:
- python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
-
-.PHONY: clean
-clean: protogen-clean
- rm -rf source venv __pycache__
\ No newline at end of file
diff --git a/backend/python/vall-e-x/README.md b/backend/python/vall-e-x/README.md
deleted file mode 100644
index a3a93361..00000000
--- a/backend/python/vall-e-x/README.md
+++ /dev/null
@@ -1,5 +0,0 @@
-# Creating a separate environment for the ttsvalle project
-
-```
-make ttsvalle
-```
\ No newline at end of file
diff --git a/backend/python/vall-e-x/backend.py b/backend/python/vall-e-x/backend.py
deleted file mode 100644
index fc9d93bd..00000000
--- a/backend/python/vall-e-x/backend.py
+++ /dev/null
@@ -1,141 +0,0 @@
-#!/usr/bin/env python3
-
-from concurrent import futures
-import argparse
-import signal
-import sys
-import os
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-from utils.generation import SAMPLE_RATE, generate_audio, preload_models
-from scipy.io.wavfile import write as write_wav
-from utils.prompt_making import make_prompt
-
-_ONE_DAY_IN_SECONDS = 60 * 60 * 24
-
-# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
-MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
-
-# Implement the BackendServicer class with the service methods
-class BackendServicer(backend_pb2_grpc.BackendServicer):
- """
- gRPC servicer for backend services.
- """
- def Health(self, request, context):
- """
- Health check service.
-
- Args:
- request: A backend_pb2.HealthRequest instance.
- context: A grpc.ServicerContext instance.
-
- Returns:
- A backend_pb2.Reply instance with message "OK".
- """
- return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
-
- def LoadModel(self, request, context):
- """
- Load model service.
-
- Args:
- request: A backend_pb2.LoadModelRequest instance.
- context: A grpc.ServicerContext instance.
-
- Returns:
- A backend_pb2.Result instance with message "Model loaded successfully" and success=True if successful.
- A backend_pb2.Result instance with success=False and error message if unsuccessful.
- """
- model_name = request.Model
- try:
- print("Preparing models, please wait", file=sys.stderr)
- # download and load all models
- preload_models()
- self.clonedVoice = False
- # Assume directory from request.ModelFile.
- # Only if request.LoraAdapter it's not an absolute path
- if request.AudioPath and request.ModelFile != "" and not os.path.isabs(request.AudioPath):
- # get base path of modelFile
- modelFileBase = os.path.dirname(request.ModelFile)
- # modify LoraAdapter to be relative to modelFileBase
- request.AudioPath = os.path.join(modelFileBase, request.AudioPath)
- if request.AudioPath != "":
- print("Generating model", file=sys.stderr)
- make_prompt(name=model_name, audio_prompt_path=request.AudioPath)
- self.clonedVoice = True
- ### Use given transcript
- ##make_prompt(name=model_name, audio_prompt_path="paimon_prompt.wav",
- ## transcript="Just, what was that? Paimon thought we were gonna get eaten.")
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
- # Implement your logic here for the LoadModel service
- # Replace this with your desired response
- return backend_pb2.Result(message="Model loaded successfully", success=True)
-
- def TTS(self, request, context):
- """
- Text-to-speech service.
-
- Args:
- request: A backend_pb2.TTSRequest instance.
- context: A grpc.ServicerContext instance.
-
- Returns:
- A backend_pb2.Result instance with success=True if successful.
- A backend_pb2.Result instance with success=False and error message if unsuccessful.
- """
- model = request.model
- print(request, file=sys.stderr)
- try:
- audio_array = None
- if model != "":
- if self.clonedVoice:
- model = os.path.basename(request.model)
- audio_array = generate_audio(request.text, prompt=model)
- else:
- audio_array = generate_audio(request.text)
- print("saving to", request.dst, file=sys.stderr)
- # save audio to disk
- write_wav(request.dst, SAMPLE_RATE, audio_array)
- print("saved to", request.dst, file=sys.stderr)
- print("tts for", file=sys.stderr)
- print(request, file=sys.stderr)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
- return backend_pb2.Result(success=True)
-
-def serve(address):
- server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
- backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
- server.add_insecure_port(address)
- server.start()
- print("Server started. Listening on: " + address, file=sys.stderr)
-
- # Define the signal handler function
- def signal_handler(sig, frame):
- print("Received termination signal. Shutting down...")
- server.stop(0)
- sys.exit(0)
-
- # Set the signal handlers for SIGINT and SIGTERM
- signal.signal(signal.SIGINT, signal_handler)
- signal.signal(signal.SIGTERM, signal_handler)
-
- try:
- while True:
- time.sleep(_ONE_DAY_IN_SECONDS)
- except KeyboardInterrupt:
- server.stop(0)
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser(description="Run the gRPC server.")
- parser.add_argument(
- "--addr", default="localhost:50051", help="The address to bind the server to."
- )
- args = parser.parse_args()
-
- serve(args.addr)
diff --git a/backend/python/vall-e-x/install.sh b/backend/python/vall-e-x/install.sh
deleted file mode 100755
index c0cce96a..00000000
--- a/backend/python/vall-e-x/install.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash
-set -e
-
-VALL_E_X_VERSION=3faaf8ccadb154d63b38070caf518ce9309ea0f4
-
-source $(dirname $0)/../common/libbackend.sh
-
-# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
-# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
-# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
-# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
-if [ "x${BUILD_PROFILE}" == "xintel" ]; then
- EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
-fi
-
-installRequirements
-
-git clone https://github.com/Plachtaa/VALL-E-X.git ${MY_DIR}/source
-pushd ${MY_DIR}/source && git checkout -b build ${VALL_E_X_VERSION} && popd
-uv pip install ${BUILD_ISOLATION_FLAG} --requirement ${MY_DIR}/source/requirements.txt
-
-cp -v ./*py $MY_DIR/source/
diff --git a/backend/python/vall-e-x/requirements-cpu.txt b/backend/python/vall-e-x/requirements-cpu.txt
deleted file mode 100644
index 0aad8812..00000000
--- a/backend/python/vall-e-x/requirements-cpu.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-accelerate
-torch==2.4.1
-torchaudio==2.4.1
\ No newline at end of file
diff --git a/backend/python/vall-e-x/requirements-cublas11.txt b/backend/python/vall-e-x/requirements-cublas11.txt
deleted file mode 100644
index c45de5b7..00000000
--- a/backend/python/vall-e-x/requirements-cublas11.txt
+++ /dev/null
@@ -1,4 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/cu118
-accelerate
-torch==2.4.1+cu118
-torchaudio==2.4.1+cu118
\ No newline at end of file
diff --git a/backend/python/vall-e-x/requirements-cublas12.txt b/backend/python/vall-e-x/requirements-cublas12.txt
deleted file mode 100644
index 0aad8812..00000000
--- a/backend/python/vall-e-x/requirements-cublas12.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-accelerate
-torch==2.4.1
-torchaudio==2.4.1
\ No newline at end of file
diff --git a/backend/python/vall-e-x/requirements-hipblas.txt b/backend/python/vall-e-x/requirements-hipblas.txt
deleted file mode 100644
index fc43790a..00000000
--- a/backend/python/vall-e-x/requirements-hipblas.txt
+++ /dev/null
@@ -1,4 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/rocm6.0
-accelerate
-torch==2.3.0+rocm6.0
-torchaudio==2.3.0+rocm6.0
\ No newline at end of file
diff --git a/backend/python/vall-e-x/requirements-intel.txt b/backend/python/vall-e-x/requirements-intel.txt
deleted file mode 100644
index efcf885a..00000000
--- a/backend/python/vall-e-x/requirements-intel.txt
+++ /dev/null
@@ -1,7 +0,0 @@
---extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-intel-extension-for-pytorch==2.3.110+xpu
-accelerate
-torch==2.3.1+cxx11.abi
-torchaudio==2.3.1+cxx11.abi
-optimum[openvino]
-oneccl_bind_pt==2.3.100+xpu
\ No newline at end of file
diff --git a/backend/python/vall-e-x/requirements.txt b/backend/python/vall-e-x/requirements.txt
deleted file mode 100644
index a1eea776..00000000
--- a/backend/python/vall-e-x/requirements.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-grpcio==1.69.0
-protobuf
-certifi
-setuptools
\ No newline at end of file
diff --git a/backend/python/vall-e-x/run.sh b/backend/python/vall-e-x/run.sh
deleted file mode 100755
index 4b0682ad..00000000
--- a/backend/python/vall-e-x/run.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/bin/bash
-BACKEND_FILE="${MY_DIR}/source/backend.py"
-
-source $(dirname $0)/../common/libbackend.sh
-
-startBackend $@
\ No newline at end of file
diff --git a/backend/python/vall-e-x/test.py b/backend/python/vall-e-x/test.py
deleted file mode 100644
index f31a148c..00000000
--- a/backend/python/vall-e-x/test.py
+++ /dev/null
@@ -1,81 +0,0 @@
-"""
-A test script to test the gRPC service
-"""
-import unittest
-import subprocess
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-
-class TestBackendServicer(unittest.TestCase):
- """
- TestBackendServicer is the class that tests the gRPC service
- """
- def setUp(self):
- """
- This method sets up the gRPC service by starting the server
- """
- self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
- time.sleep(10)
-
- def tearDown(self) -> None:
- """
- This method tears down the gRPC service by terminating the server
- """
- self.service.terminate()
- self.service.wait()
-
- def test_server_startup(self):
- """
- This method tests if the server starts up successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.Health(backend_pb2.HealthMessage())
- self.assertEqual(response.message, b'OK')
- except Exception as err:
- print(err)
- self.fail("Server failed to start")
- finally:
- self.tearDown()
-
- def test_load_model(self):
- """
- This method tests if the model is loaded successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="dingzhen"))
- self.assertTrue(response.success)
- self.assertEqual(response.message, "Model loaded successfully")
- except Exception as err:
- print(err)
- self.fail("LoadModel service failed")
- finally:
- self.tearDown()
-
- def test_tts(self):
- """
- This method tests if the embeddings are generated successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="dingzhen"))
- self.assertTrue(response.success)
- tts_request = backend_pb2.TTSRequest(text="80s TV news production music hit for tonight's biggest story")
- tts_response = stub.TTS(tts_request)
- self.assertIsNotNone(tts_response)
- except Exception as err:
- print(err)
- self.fail("TTS service failed")
- finally:
- self.tearDown()
\ No newline at end of file
diff --git a/backend/python/vall-e-x/test.sh b/backend/python/vall-e-x/test.sh
deleted file mode 100755
index 57336b39..00000000
--- a/backend/python/vall-e-x/test.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/bin/bash
-set -e
-TEST_FILE="./source/test.py"
-
-source $(dirname $0)/../common/libbackend.sh
-
-runUnittests
diff --git a/core/backend/options.go b/core/backend/options.go
index f6247c60..92a42893 100644
--- a/core/backend/options.go
+++ b/core/backend/options.go
@@ -140,7 +140,7 @@ func grpcModelOpts(c config.BackendConfig) *pb.ModelOptions {
NBatch: int32(b),
NoMulMatQ: c.NoMulMatQ,
DraftModel: c.DraftModel,
- AudioPath: c.VallE.AudioPath,
+ AudioPath: c.AudioPath,
Quantization: c.Quantization,
LoadFormat: c.LoadFormat,
GPUMemoryUtilization: c.GPUMemoryUtilization,
diff --git a/core/config/backend_config.go b/core/config/backend_config.go
index f07ec3d3..bb2fa643 100644
--- a/core/config/backend_config.go
+++ b/core/config/backend_config.go
@@ -21,8 +21,7 @@ type TTSConfig struct {
// Voice wav path or id
Voice string `yaml:"voice"`
- // Vall-e-x
- VallE VallE `yaml:"vall-e"`
+ AudioPath string `yaml:"audio_path"`
}
type BackendConfig struct {
@@ -82,10 +81,6 @@ type File struct {
URI downloader.URI `yaml:"uri" json:"uri"`
}
-type VallE struct {
- AudioPath string `yaml:"audio_path"`
-}
-
type FeatureFlag map[string]*bool
func (ff FeatureFlag) Enabled(s string) bool {
From b147ad059611f109e3e2a33494a6e8438b2939e8 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 17 Jan 2025 10:14:23 +0100
Subject: [PATCH 037/679] ci: try to build for arm64
Try to use the free arm64 runners from Github:
https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/
Signed-off-by: Ettore Di Giacinto
---
.github/workflows/image.yml | 54 ++++++++++---------------------------
1 file changed, 14 insertions(+), 40 deletions(-)
diff --git a/.github/workflows/image.yml b/.github/workflows/image.yml
index 68727ebe..47bc507a 100644
--- a/.github/workflows/image.yml
+++ b/.github/workflows/image.yml
@@ -362,43 +362,17 @@ jobs:
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
-# parallel-builds:
-# uses: ./.github/workflows/image_build.yml
-# with:
-# tag-latest: ${{ matrix.tag-latest }}
-# tag-suffix: ${{ matrix.tag-suffix }}
-# ffmpeg: ${{ matrix.ffmpeg }}
-# image-type: ${{ matrix.image-type }}
-# build-type: ${{ matrix.build-type }}
-# cuda-major-version: ${{ matrix.cuda-major-version }}
-# cuda-minor-version: ${{ matrix.cuda-minor-version }}
-# platforms: ${{ matrix.platforms }}
-# runs-on: ${{ matrix.runs-on }}
-# aio: ${{ matrix.aio }}
-# base-image: ${{ matrix.base-image }}
-# grpc-base-image: ${{ matrix.grpc-base-image }}
-# makeflags: ${{ matrix.makeflags }}
-# latest-image: ${{ matrix.latest-image }}
-# latest-image-aio: ${{ matrix.latest-image-aio }}
-# skip-drivers: ${{ matrix.skip-drivers }}
-# secrets:
-# dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-# dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-# quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-# quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-# strategy:
-# matrix:
-# include:
-# - build-type: 'cublas'
-# cuda-major-version: "12"
-# cuda-minor-version: "0"
-# platforms: 'linux/arm64'
-# tag-latest: 'false'
-# tag-suffix: '-nvidia-l4t-arm64-core'
-# latest-image: 'latest-nvidia-l4t-arm64-core'
-# ffmpeg: 'true'
-# image-type: 'core'
-# base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-# runs-on: 'self-hosted'
-# makeflags: "--jobs=4 --output-sync=target"
-# skip-drivers: 'true'
+ # ARM64
+ - build-type: 'cublas'
+ cuda-major-version: "12"
+ cuda-minor-version: "0"
+ platforms: 'linux/arm64'
+ tag-latest: 'false'
+ tag-suffix: '-nvidia-l4t-arm64-core'
+ latest-image: 'latest-nvidia-l4t-arm64-core'
+ ffmpeg: 'true'
+ image-type: 'core'
+ base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
+ runs-on: 'ubuntu-24.04-arm'
+ makeflags: "--jobs=4 --output-sync=target"
+ skip-drivers: 'true'
From b5eeb5c5ab96721afc2daf600b32b49d70e9c2a2 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 17 Jan 2025 10:24:15 +0100
Subject: [PATCH 038/679] ci(arm64): run in parallel
Signed-off-by: Ettore Di Giacinto
---
.github/workflows/image.yml | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/image.yml b/.github/workflows/image.yml
index 47bc507a..722d0f41 100644
--- a/.github/workflows/image.yml
+++ b/.github/workflows/image.yml
@@ -362,7 +362,33 @@ jobs:
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
- # ARM64
+ gh-runner:
+ uses: ./.github/workflows/image_build.yml
+ with:
+ tag-latest: ${{ matrix.tag-latest }}
+ tag-suffix: ${{ matrix.tag-suffix }}
+ ffmpeg: ${{ matrix.ffmpeg }}
+ image-type: ${{ matrix.image-type }}
+ build-type: ${{ matrix.build-type }}
+ cuda-major-version: ${{ matrix.cuda-major-version }}
+ cuda-minor-version: ${{ matrix.cuda-minor-version }}
+ platforms: ${{ matrix.platforms }}
+ runs-on: ${{ matrix.runs-on }}
+ aio: ${{ matrix.aio }}
+ base-image: ${{ matrix.base-image }}
+ grpc-base-image: ${{ matrix.grpc-base-image }}
+ makeflags: ${{ matrix.makeflags }}
+ latest-image: ${{ matrix.latest-image }}
+ latest-image-aio: ${{ matrix.latest-image-aio }}
+ skip-drivers: ${{ matrix.skip-drivers }}
+ secrets:
+ dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+ dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+ quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+ quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+ strategy:
+ matrix:
+ include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
@@ -375,4 +401,4 @@ jobs:
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
- skip-drivers: 'true'
+ skip-drivers: 'true'
\ No newline at end of file
From 78533d7230bdb5e352e325c15d0d53f38428b08e Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Fri, 17 Jan 2025 10:25:04 +0100
Subject: [PATCH 039/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`4dbc8b9cb71876e005724f4e8f73a3544646bcf5` (#4618)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 1983f568..f08d1a9c 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=adc5dd92e8aea98f5e7ac84f6e1bc15de35130b5
+CPPLLAMA_VERSION?=4dbc8b9cb71876e005724f4e8f73a3544646bcf5
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 212c8e1a6da1503a7f45a2aeb4efc8f4b9faad7a Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 17 Jan 2025 15:11:10 +0100
Subject: [PATCH 040/679] Update README.md
Signed-off-by: Ettore Di Giacinto
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index ec4db188..4d415d16 100644
--- a/README.md
+++ b/README.md
@@ -92,7 +92,7 @@ local-ai run oci://localai/phi-2:latest
## š° Latest project news
-- January 2025: SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
+- Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
- Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
- Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
- Nov 2024: Voice activity detection models (**VAD**) added to the API: https://github.com/mudler/LocalAI/pull/4204
From 8027fdf1c781696b3196f6f71fee8bfb63472cbf Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 17 Jan 2025 17:01:16 +0100
Subject: [PATCH 041/679] feat(transformers): merge musicgen functionalities to
a single backend (#4620)
* feat(transformers): merge musicgen functionalities to a single backend
So we optimize space
Signed-off-by: Ettore Di Giacinto
* specify type in tests
Signed-off-by: Ettore Di Giacinto
* Some adaptations for the MusicgenForConditionalGeneration type
Signed-off-by: Ettore Di Giacinto
---------
Signed-off-by: Ettore Di Giacinto
---
.bruno/LocalAI Test Requests/tts/musicgen.bru | 2 +-
.github/dependabot.yml | 4 -
.github/workflows/test-extra.yml | 40 ++--
Dockerfile | 5 +-
Makefile | 13 +-
backend/python/transformers-musicgen/Makefile | 29 ---
.../python/transformers-musicgen/README.md | 5 -
.../python/transformers-musicgen/backend.py | 176 ------------------
.../python/transformers-musicgen/install.sh | 14 --
.../requirements-cpu.txt | 3 -
.../requirements-cublas11.txt | 4 -
.../requirements-cublas12.txt | 3 -
.../requirements-hipblas.txt | 4 -
.../requirements-intel.txt | 8 -
.../transformers-musicgen/requirements.txt | 4 -
backend/python/transformers-musicgen/run.sh | 4 -
backend/python/transformers-musicgen/test.py | 100 ----------
backend/python/transformers-musicgen/test.sh | 6 -
backend/python/transformers/backend.py | 115 +++++++++++-
backend/python/transformers/requirements.txt | 3 +-
backend/python/transformers/test.py | 59 +++++-
21 files changed, 187 insertions(+), 414 deletions(-)
delete mode 100644 backend/python/transformers-musicgen/Makefile
delete mode 100644 backend/python/transformers-musicgen/README.md
delete mode 100644 backend/python/transformers-musicgen/backend.py
delete mode 100755 backend/python/transformers-musicgen/install.sh
delete mode 100644 backend/python/transformers-musicgen/requirements-cpu.txt
delete mode 100644 backend/python/transformers-musicgen/requirements-cublas11.txt
delete mode 100644 backend/python/transformers-musicgen/requirements-cublas12.txt
delete mode 100644 backend/python/transformers-musicgen/requirements-hipblas.txt
delete mode 100644 backend/python/transformers-musicgen/requirements-intel.txt
delete mode 100644 backend/python/transformers-musicgen/requirements.txt
delete mode 100755 backend/python/transformers-musicgen/run.sh
delete mode 100644 backend/python/transformers-musicgen/test.py
delete mode 100755 backend/python/transformers-musicgen/test.sh
diff --git a/.bruno/LocalAI Test Requests/tts/musicgen.bru b/.bruno/LocalAI Test Requests/tts/musicgen.bru
index a720b8b1..900173eb 100644
--- a/.bruno/LocalAI Test Requests/tts/musicgen.bru
+++ b/.bruno/LocalAI Test Requests/tts/musicgen.bru
@@ -16,7 +16,7 @@ headers {
body:json {
{
- "backend": "transformers-musicgen",
+ "backend": "transformers",
"model": "facebook/musicgen-small",
"input": "80s Synths playing Jazz"
}
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
index 8fa0cca5..570ac569 100644
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@@ -81,10 +81,6 @@ updates:
directory: "/backend/python/transformers"
schedule:
interval: "weekly"
- - package-ecosystem: "pip"
- directory: "/backend/python/transformers-musicgen"
- schedule:
- interval: "weekly"
- package-ecosystem: "pip"
directory: "/backend/python/vllm"
schedule:
diff --git a/.github/workflows/test-extra.yml b/.github/workflows/test-extra.yml
index 3c2fee37..eacd3ab0 100644
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -153,27 +153,27 @@ jobs:
make --jobs=5 --output-sync=target -C backend/python/openvoice
make --jobs=5 --output-sync=target -C backend/python/openvoice test
- tests-transformers-musicgen:
- runs-on: ubuntu-latest
- steps:
- - name: Clone
- uses: actions/checkout@v4
- with:
- submodules: true
- - name: Dependencies
- run: |
- sudo apt-get update
- sudo apt-get install build-essential ffmpeg
- # Install UV
- curl -LsSf https://astral.sh/uv/install.sh | sh
- sudo apt-get install -y ca-certificates cmake curl patch python3-pip
- sudo apt-get install -y libopencv-dev
- pip install --user --no-cache-dir grpcio-tools==1.64.1
+ # tests-transformers-musicgen:
+ # runs-on: ubuntu-latest
+ # steps:
+ # - name: Clone
+ # uses: actions/checkout@v4
+ # with:
+ # submodules: true
+ # - name: Dependencies
+ # run: |
+ # sudo apt-get update
+ # sudo apt-get install build-essential ffmpeg
+ # # Install UV
+ # curl -LsSf https://astral.sh/uv/install.sh | sh
+ # sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+ # sudo apt-get install -y libopencv-dev
+ # pip install --user --no-cache-dir grpcio-tools==1.64.1
- - name: Test transformers-musicgen
- run: |
- make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen
- make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen test
+ # - name: Test transformers-musicgen
+ # run: |
+ # make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen
+ # make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen test
# tests-bark:
# runs-on: ubuntu-latest
diff --git a/Dockerfile b/Dockerfile
index 354ef298..9fb07516 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
RUN apt-get update && \
@@ -448,9 +448,6 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAG
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "diffusers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/diffusers \
- ; fi && \
- if [[ ( "${EXTRA_BACKENDS}" =~ "transformers-musicgen" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
- make -C backend/python/transformers-musicgen \
; fi
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
diff --git a/Makefile b/Makefile
index f08d1a9c..03468ffb 100644
--- a/Makefile
+++ b/Makefile
@@ -583,10 +583,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
-protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen transformers-musicgen-protogen kokoro-protogen vllm-protogen openvoice-protogen
+protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen kokoro-protogen vllm-protogen openvoice-protogen
.PHONY: protogen-python-clean
-protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean transformers-musicgen-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean
+protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -668,14 +668,6 @@ parler-tts-protogen:
parler-tts-protogen-clean:
$(MAKE) -C backend/python/parler-tts protogen-clean
-.PHONY: transformers-musicgen-protogen
-transformers-musicgen-protogen:
- $(MAKE) -C backend/python/transformers-musicgen protogen
-
-.PHONY: transformers-musicgen-protogen-clean
-transformers-musicgen-protogen-clean:
- $(MAKE) -C backend/python/transformers-musicgen protogen-clean
-
.PHONY: kokoro-protogen
kokoro-protogen:
$(MAKE) -C backend/python/kokoro protogen
@@ -712,7 +704,6 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/sentencetransformers
$(MAKE) -C backend/python/rerankers
$(MAKE) -C backend/python/transformers
- $(MAKE) -C backend/python/transformers-musicgen
$(MAKE) -C backend/python/parler-tts
$(MAKE) -C backend/python/kokoro
$(MAKE) -C backend/python/openvoice
diff --git a/backend/python/transformers-musicgen/Makefile b/backend/python/transformers-musicgen/Makefile
deleted file mode 100644
index 06badf6d..00000000
--- a/backend/python/transformers-musicgen/Makefile
+++ /dev/null
@@ -1,29 +0,0 @@
-.PHONY: transformers-musicgen
-transformers-musicgen: protogen
- bash install.sh
-
-.PHONY: run
-run: protogen
- @echo "Running transformers..."
- bash run.sh
- @echo "transformers run."
-
-.PHONY: test
-test: protogen
- @echo "Testing transformers..."
- bash test.sh
- @echo "transformers tested."
-
-.PHONY: protogen
-protogen: backend_pb2_grpc.py backend_pb2.py
-
-.PHONY: protogen-clean
-protogen-clean:
- $(RM) backend_pb2_grpc.py backend_pb2.py
-
-backend_pb2_grpc.py backend_pb2.py:
- python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
-
-.PHONY: clean
-clean: protogen-clean
- rm -rf venv __pycache__
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/README.md b/backend/python/transformers-musicgen/README.md
deleted file mode 100644
index bf7fef84..00000000
--- a/backend/python/transformers-musicgen/README.md
+++ /dev/null
@@ -1,5 +0,0 @@
-# Creating a separate environment for the transformers project
-
-```
-make transformers-musicgen
-```
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/backend.py b/backend/python/transformers-musicgen/backend.py
deleted file mode 100644
index b9f1facf..00000000
--- a/backend/python/transformers-musicgen/backend.py
+++ /dev/null
@@ -1,176 +0,0 @@
-#!/usr/bin/env python3
-"""
-Extra gRPC server for MusicgenForConditionalGeneration models.
-"""
-from concurrent import futures
-
-import argparse
-import signal
-import sys
-import os
-
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-from scipy.io import wavfile
-from transformers import AutoProcessor, MusicgenForConditionalGeneration
-
-_ONE_DAY_IN_SECONDS = 60 * 60 * 24
-
-# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
-MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
-
-# Implement the BackendServicer class with the service methods
-class BackendServicer(backend_pb2_grpc.BackendServicer):
- """
- A gRPC servicer for the backend service.
-
- This class implements the gRPC methods for the backend service, including Health, LoadModel, and Embedding.
- """
- def Health(self, request, context):
- """
- A gRPC method that returns the health status of the backend service.
-
- Args:
- request: A HealthRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- A Reply object that contains the health status of the backend service.
- """
- return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
-
- def LoadModel(self, request, context):
- """
- A gRPC method that loads a model into memory.
-
- Args:
- request: A LoadModelRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- A Result object that contains the result of the LoadModel operation.
- """
- model_name = request.Model
- try:
- self.processor = AutoProcessor.from_pretrained(model_name)
- self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
-
- return backend_pb2.Result(message="Model loaded successfully", success=True)
-
- def SoundGeneration(self, request, context):
- model_name = request.model
- if model_name == "":
- return backend_pb2.Result(success=False, message="request.model is required")
- try:
- self.processor = AutoProcessor.from_pretrained(model_name)
- self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
- inputs = None
- if request.text == "":
- inputs = self.model.get_unconditional_inputs(num_samples=1)
- elif request.HasField('src'):
- # TODO SECURITY CODE GOES HERE LOL
- # WHO KNOWS IF THIS WORKS???
- sample_rate, wsamples = wavfile.read('path_to_your_file.wav')
-
- if request.HasField('src_divisor'):
- wsamples = wsamples[: len(wsamples) // request.src_divisor]
-
- inputs = self.processor(
- audio=wsamples,
- sampling_rate=sample_rate,
- text=[request.text],
- padding=True,
- return_tensors="pt",
- )
- else:
- inputs = self.processor(
- text=[request.text],
- padding=True,
- return_tensors="pt",
- )
-
- tokens = 256
- if request.HasField('duration'):
- tokens = int(request.duration * 51.2) # 256 tokens = 5 seconds, therefore 51.2 tokens is one second
- guidance = 3.0
- if request.HasField('temperature'):
- guidance = request.temperature
- dosample = True
- if request.HasField('sample'):
- dosample = request.sample
- audio_values = self.model.generate(**inputs, do_sample=dosample, guidance_scale=guidance, max_new_tokens=tokens)
- print("[transformers-musicgen] SoundGeneration generated!", file=sys.stderr)
- sampling_rate = self.model.config.audio_encoder.sampling_rate
- wavfile.write(request.dst, rate=sampling_rate, data=audio_values[0, 0].numpy())
- print("[transformers-musicgen] SoundGeneration saved to", request.dst, file=sys.stderr)
- print("[transformers-musicgen] SoundGeneration for", file=sys.stderr)
- print("[transformers-musicgen] SoundGeneration requested tokens", tokens, file=sys.stderr)
- print(request, file=sys.stderr)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
- return backend_pb2.Result(success=True)
-
-
-# The TTS endpoint is older, and provides fewer features, but exists for compatibility reasons
- def TTS(self, request, context):
- model_name = request.model
- if model_name == "":
- return backend_pb2.Result(success=False, message="request.model is required")
- try:
- self.processor = AutoProcessor.from_pretrained(model_name)
- self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
- inputs = self.processor(
- text=[request.text],
- padding=True,
- return_tensors="pt",
- )
- tokens = 512 # No good place to set the "length" in TTS, so use 10s as a sane default
- audio_values = self.model.generate(**inputs, max_new_tokens=tokens)
- print("[transformers-musicgen] TTS generated!", file=sys.stderr)
- sampling_rate = self.model.config.audio_encoder.sampling_rate
- write_wav(request.dst, rate=sampling_rate, data=audio_values[0, 0].numpy())
- print("[transformers-musicgen] TTS saved to", request.dst, file=sys.stderr)
- print("[transformers-musicgen] TTS for", file=sys.stderr)
- print(request, file=sys.stderr)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
- return backend_pb2.Result(success=True)
-
-
-def serve(address):
- server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
- backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
- server.add_insecure_port(address)
- server.start()
- print("[transformers-musicgen] Server started. Listening on: " + address, file=sys.stderr)
-
- # Define the signal handler function
- def signal_handler(sig, frame):
- print("[transformers-musicgen] Received termination signal. Shutting down...")
- server.stop(0)
- sys.exit(0)
-
- # Set the signal handlers for SIGINT and SIGTERM
- signal.signal(signal.SIGINT, signal_handler)
- signal.signal(signal.SIGTERM, signal_handler)
-
- try:
- while True:
- time.sleep(_ONE_DAY_IN_SECONDS)
- except KeyboardInterrupt:
- server.stop(0)
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser(description="Run the gRPC server.")
- parser.add_argument(
- "--addr", default="localhost:50051", help="The address to bind the server to."
- )
- args = parser.parse_args()
- print(f"[transformers-musicgen] startup: {args}", file=sys.stderr)
- serve(args.addr)
diff --git a/backend/python/transformers-musicgen/install.sh b/backend/python/transformers-musicgen/install.sh
deleted file mode 100755
index 36443ef1..00000000
--- a/backend/python/transformers-musicgen/install.sh
+++ /dev/null
@@ -1,14 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
-# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
-# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
-# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
-if [ "x${BUILD_PROFILE}" == "xintel" ]; then
- EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
-fi
-
-installRequirements
diff --git a/backend/python/transformers-musicgen/requirements-cpu.txt b/backend/python/transformers-musicgen/requirements-cpu.txt
deleted file mode 100644
index 2021fc20..00000000
--- a/backend/python/transformers-musicgen/requirements-cpu.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-transformers
-accelerate
-torch==2.4.1
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/requirements-cublas11.txt b/backend/python/transformers-musicgen/requirements-cublas11.txt
deleted file mode 100644
index cd2c9fdb..00000000
--- a/backend/python/transformers-musicgen/requirements-cublas11.txt
+++ /dev/null
@@ -1,4 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/cu118
-transformers
-accelerate
-torch==2.4.1+cu118
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/requirements-cublas12.txt b/backend/python/transformers-musicgen/requirements-cublas12.txt
deleted file mode 100644
index 2021fc20..00000000
--- a/backend/python/transformers-musicgen/requirements-cublas12.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-transformers
-accelerate
-torch==2.4.1
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/requirements-hipblas.txt b/backend/python/transformers-musicgen/requirements-hipblas.txt
deleted file mode 100644
index 122b2032..00000000
--- a/backend/python/transformers-musicgen/requirements-hipblas.txt
+++ /dev/null
@@ -1,4 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/rocm6.0
-transformers
-accelerate
-torch==2.4.1+rocm6.0
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/requirements-intel.txt b/backend/python/transformers-musicgen/requirements-intel.txt
deleted file mode 100644
index ac2feb42..00000000
--- a/backend/python/transformers-musicgen/requirements-intel.txt
+++ /dev/null
@@ -1,8 +0,0 @@
---extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-intel-extension-for-pytorch==2.3.110+xpu
-transformers
-oneccl_bind_pt==2.3.100+xpu
-accelerate
-torch==2.3.1+cxx11.abi
-optimum[openvino]
-setuptools
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/requirements.txt b/backend/python/transformers-musicgen/requirements.txt
deleted file mode 100644
index f58e1e80..00000000
--- a/backend/python/transformers-musicgen/requirements.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-grpcio==1.69.0
-protobuf
-scipy==1.14.0
-certifi
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/run.sh b/backend/python/transformers-musicgen/run.sh
deleted file mode 100755
index 375c07e5..00000000
--- a/backend/python/transformers-musicgen/run.sh
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/bin/bash
-source $(dirname $0)/../common/libbackend.sh
-
-startBackend $@
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/test.py b/backend/python/transformers-musicgen/test.py
deleted file mode 100644
index 295de65e..00000000
--- a/backend/python/transformers-musicgen/test.py
+++ /dev/null
@@ -1,100 +0,0 @@
-"""
-A test script to test the gRPC service
-"""
-import unittest
-import subprocess
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-
-class TestBackendServicer(unittest.TestCase):
- """
- TestBackendServicer is the class that tests the gRPC service
- """
- def setUp(self):
- """
- This method sets up the gRPC service by starting the server
- """
- self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
- time.sleep(10)
-
- def tearDown(self) -> None:
- """
- This method tears down the gRPC service by terminating the server
- """
- self.service.terminate()
- self.service.wait()
-
- def test_server_startup(self):
- """
- This method tests if the server starts up successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.Health(backend_pb2.HealthMessage())
- self.assertEqual(response.message, b'OK')
- except Exception as err:
- print(err)
- self.fail("Server failed to start")
- finally:
- self.tearDown()
-
- def test_load_model(self):
- """
- This method tests if the model is loaded successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/musicgen-small"))
- self.assertTrue(response.success)
- self.assertEqual(response.message, "Model loaded successfully")
- except Exception as err:
- print(err)
- self.fail("LoadModel service failed")
- finally:
- self.tearDown()
-
- def test_tts(self):
- """
- This method tests if TTS is generated successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/musicgen-small"))
- self.assertTrue(response.success)
- tts_request = backend_pb2.TTSRequest(text="80s TV news production music hit for tonight's biggest story")
- tts_response = stub.TTS(tts_request)
- self.assertIsNotNone(tts_response)
- except Exception as err:
- print(err)
- self.fail("TTS service failed")
- finally:
- self.tearDown()
-
- def test_sound_generation(self):
- """
- This method tests if SoundGeneration is generated successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/musicgen-small"))
- self.assertTrue(response.success)
- sg_request = backend_pb2.SoundGenerationRequest(text="80s TV news production music hit for tonight's biggest story")
- sg_response = stub.SoundGeneration(sg_request)
- self.assertIsNotNone(sg_response)
- except Exception as err:
- print(err)
- self.fail("SoundGeneration service failed")
- finally:
- self.tearDown()
\ No newline at end of file
diff --git a/backend/python/transformers-musicgen/test.sh b/backend/python/transformers-musicgen/test.sh
deleted file mode 100755
index 6940b066..00000000
--- a/backend/python/transformers-musicgen/test.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-runUnittests
diff --git a/backend/python/transformers/backend.py b/backend/python/transformers/backend.py
index 2075012e..3f6838ad 100644
--- a/backend/python/transformers/backend.py
+++ b/backend/python/transformers/backend.py
@@ -22,6 +22,8 @@ import torch.cuda
XPU=os.environ.get("XPU", "0") == "1"
from transformers import AutoTokenizer, AutoModel, set_seed, TextIteratorStreamer, StoppingCriteriaList, StopStringCriteria
+from transformers import AutoProcessor, MusicgenForConditionalGeneration
+from scipy.io import wavfile
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
@@ -191,6 +193,9 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
export=True,
device=device_map)
self.OV = True
+ elif request.Type == "MusicgenForConditionalGeneration":
+ self.processor = AutoProcessor.from_pretrained(model_name)
+ self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
else:
print("Automodel", file=sys.stderr)
self.model = AutoModel.from_pretrained(model_name,
@@ -201,19 +206,22 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
torch_dtype=compute)
if request.ContextSize > 0:
self.max_tokens = request.ContextSize
- else:
+ elif request.Type != "MusicgenForConditionalGeneration":
self.max_tokens = self.model.config.max_position_embeddings
+ else:
+ self.max_tokens = 512
- self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_safetensors=True)
- self.XPU = False
+ if request.Type != "MusicgenForConditionalGeneration":
+ self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_safetensors=True)
+ self.XPU = False
- if XPU and self.OV == False:
- self.XPU = True
- try:
- print("Optimizing model", model_name, "to XPU.", file=sys.stderr)
- self.model = ipex.optimize_transformers(self.model, inplace=True, dtype=torch.float16, device="xpu")
- except Exception as err:
- print("Not using XPU:", err, file=sys.stderr)
+ if XPU and self.OV == False:
+ self.XPU = True
+ try:
+ print("Optimizing model", model_name, "to XPU.", file=sys.stderr)
+ self.model = ipex.optimize_transformers(self.model, inplace=True, dtype=torch.float16, device="xpu")
+ except Exception as err:
+ print("Not using XPU:", err, file=sys.stderr)
except Exception as err:
print("Error:", err, file=sys.stderr)
@@ -380,6 +388,93 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
finally:
await iterations.aclose()
+ def SoundGeneration(self, request, context):
+ model_name = request.model
+ try:
+ if self.processor is None:
+ if model_name == "":
+ return backend_pb2.Result(success=False, message="request.model is required")
+ self.processor = AutoProcessor.from_pretrained(model_name)
+ if self.model is None:
+ if model_name == "":
+ return backend_pb2.Result(success=False, message="request.model is required")
+ self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
+ inputs = None
+ if request.text == "":
+ inputs = self.model.get_unconditional_inputs(num_samples=1)
+ elif request.HasField('src'):
+ # TODO SECURITY CODE GOES HERE LOL
+ # WHO KNOWS IF THIS WORKS???
+ sample_rate, wsamples = wavfile.read('path_to_your_file.wav')
+
+ if request.HasField('src_divisor'):
+ wsamples = wsamples[: len(wsamples) // request.src_divisor]
+
+ inputs = self.processor(
+ audio=wsamples,
+ sampling_rate=sample_rate,
+ text=[request.text],
+ padding=True,
+ return_tensors="pt",
+ )
+ else:
+ inputs = self.processor(
+ text=[request.text],
+ padding=True,
+ return_tensors="pt",
+ )
+
+ tokens = 256
+ if request.HasField('duration'):
+ tokens = int(request.duration * 51.2) # 256 tokens = 5 seconds, therefore 51.2 tokens is one second
+ guidance = 3.0
+ if request.HasField('temperature'):
+ guidance = request.temperature
+ dosample = True
+ if request.HasField('sample'):
+ dosample = request.sample
+ audio_values = self.model.generate(**inputs, do_sample=dosample, guidance_scale=guidance, max_new_tokens=tokens)
+ print("[transformers-musicgen] SoundGeneration generated!", file=sys.stderr)
+ sampling_rate = self.model.config.audio_encoder.sampling_rate
+ wavfile.write(request.dst, rate=sampling_rate, data=audio_values[0, 0].numpy())
+ print("[transformers-musicgen] SoundGeneration saved to", request.dst, file=sys.stderr)
+ print("[transformers-musicgen] SoundGeneration for", file=sys.stderr)
+ print("[transformers-musicgen] SoundGeneration requested tokens", tokens, file=sys.stderr)
+ print(request, file=sys.stderr)
+ except Exception as err:
+ return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
+ return backend_pb2.Result(success=True)
+
+
+# The TTS endpoint is older, and provides fewer features, but exists for compatibility reasons
+ def TTS(self, request, context):
+ model_name = request.model
+ try:
+ if self.processor is None:
+ if model_name == "":
+ return backend_pb2.Result(success=False, message="request.model is required")
+ self.processor = AutoProcessor.from_pretrained(model_name)
+ if self.model is None:
+ if model_name == "":
+ return backend_pb2.Result(success=False, message="request.model is required")
+ self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
+ inputs = self.processor(
+ text=[request.text],
+ padding=True,
+ return_tensors="pt",
+ )
+ tokens = 512 # No good place to set the "length" in TTS, so use 10s as a sane default
+ audio_values = self.model.generate(**inputs, max_new_tokens=tokens)
+ print("[transformers-musicgen] TTS generated!", file=sys.stderr)
+ sampling_rate = self.model.config.audio_encoder.sampling_rate
+ wavfile.write(request.dst, rate=sampling_rate, data=audio_values[0, 0].numpy())
+ print("[transformers-musicgen] TTS saved to", request.dst, file=sys.stderr)
+ print("[transformers-musicgen] TTS for", file=sys.stderr)
+ print(request, file=sys.stderr)
+ except Exception as err:
+ return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
+ return backend_pb2.Result(success=True)
+
async def serve(address):
# Start asyncio gRPC server
server = grpc.aio.server(migration_thread_pool=futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
diff --git a/backend/python/transformers/requirements.txt b/backend/python/transformers/requirements.txt
index a1eea776..262dd17a 100644
--- a/backend/python/transformers/requirements.txt
+++ b/backend/python/transformers/requirements.txt
@@ -1,4 +1,5 @@
grpcio==1.69.0
protobuf
certifi
-setuptools
\ No newline at end of file
+setuptools
+scipy==1.14.0
\ No newline at end of file
diff --git a/backend/python/transformers/test.py b/backend/python/transformers/test.py
index aab3c05e..305b0a93 100644
--- a/backend/python/transformers/test.py
+++ b/backend/python/transformers/test.py
@@ -19,6 +19,7 @@ class TestBackendServicer(unittest.TestCase):
This method sets up the gRPC service by starting the server
"""
self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
+ time.sleep(10)
def tearDown(self) -> None:
"""
@@ -31,7 +32,6 @@ class TestBackendServicer(unittest.TestCase):
"""
This method tests if the server starts up successfully
"""
- time.sleep(10)
try:
self.setUp()
with grpc.insecure_channel("localhost:50051") as channel:
@@ -48,7 +48,6 @@ class TestBackendServicer(unittest.TestCase):
"""
This method tests if the model is loaded successfully
"""
- time.sleep(10)
try:
self.setUp()
with grpc.insecure_channel("localhost:50051") as channel:
@@ -66,7 +65,6 @@ class TestBackendServicer(unittest.TestCase):
"""
This method tests if the embeddings are generated successfully
"""
- time.sleep(10)
try:
self.setUp()
with grpc.insecure_channel("localhost:50051") as channel:
@@ -80,5 +78,60 @@ class TestBackendServicer(unittest.TestCase):
except Exception as err:
print(err)
self.fail("Embedding service failed")
+ finally:
+ self.tearDown()
+
+ def test_audio_load_model(self):
+ """
+ This method tests if the model is loaded successfully
+ """
+ try:
+ self.setUp()
+ with grpc.insecure_channel("localhost:50051") as channel:
+ stub = backend_pb2_grpc.BackendStub(channel)
+ response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/musicgen-small",Type="MusicgenForConditionalGeneration"))
+ self.assertTrue(response.success)
+ self.assertEqual(response.message, "Model loaded successfully")
+ except Exception as err:
+ print(err)
+ self.fail("LoadModel service failed")
+ finally:
+ self.tearDown()
+
+ def test_tts(self):
+ """
+ This method tests if TTS is generated successfully
+ """
+ try:
+ self.setUp()
+ with grpc.insecure_channel("localhost:50051") as channel:
+ stub = backend_pb2_grpc.BackendStub(channel)
+ response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/musicgen-small",Type="MusicgenForConditionalGeneration"))
+ self.assertTrue(response.success)
+ tts_request = backend_pb2.TTSRequest(text="80s TV news production music hit for tonight's biggest story")
+ tts_response = stub.TTS(tts_request)
+ self.assertIsNotNone(tts_response)
+ except Exception as err:
+ print(err)
+ self.fail("TTS service failed")
+ finally:
+ self.tearDown()
+
+ def test_sound_generation(self):
+ """
+ This method tests if SoundGeneration is generated successfully
+ """
+ try:
+ self.setUp()
+ with grpc.insecure_channel("localhost:50051") as channel:
+ stub = backend_pb2_grpc.BackendStub(channel)
+ response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/musicgen-small",Type="MusicgenForConditionalGeneration"))
+ self.assertTrue(response.success)
+ sg_request = backend_pb2.SoundGenerationRequest(text="80s TV news production music hit for tonight's biggest story")
+ sg_response = stub.SoundGeneration(sg_request)
+ self.assertIsNotNone(sg_response)
+ except Exception as err:
+ print(err)
+ self.fail("SoundGeneration service failed")
finally:
self.tearDown()
\ No newline at end of file
From 96f8ec0402ff54e0bf7bdd7c8986497184a2b9f8 Mon Sep 17 00:00:00 2001
From: mintyleaf
Date: Fri, 17 Jan 2025 20:05:58 +0400
Subject: [PATCH 042/679] feat: add machine tag and inference timings (#4577)
* Add machine tag option, add extraUsage option, grpc-server -> proto -> endpoint extraUsage data is broken for now
Signed-off-by: mintyleaf
* remove redurant timing fields, fix not working timings output
Signed-off-by: mintyleaf
* use middleware for Machine-Tag only if tag is specified
Signed-off-by: mintyleaf
---------
Signed-off-by: mintyleaf
---
backend/backend.proto | 4 +-
backend/cpp/llama/grpc-server.cpp | 14 ++++++
core/backend/llm.go | 12 ++++-
core/cli/run.go | 2 +
core/config/application_config.go | 8 ++++
core/http/app.go | 8 ++++
core/http/endpoints/localai/tts.go | 1 -
core/http/endpoints/localai/vad.go | 1 -
core/http/endpoints/openai/chat.go | 59 ++++++++++++++++--------
core/http/endpoints/openai/completion.go | 44 ++++++++++++------
core/http/endpoints/openai/edit.go | 21 +++++++--
core/http/endpoints/openai/inference.go | 2 +
core/http/endpoints/openai/list.go | 2 +-
core/http/routes/openai.go | 4 +-
core/schema/openai.go | 3 ++
15 files changed, 137 insertions(+), 48 deletions(-)
diff --git a/backend/backend.proto b/backend/backend.proto
index 0a341ca2..fea4214f 100644
--- a/backend/backend.proto
+++ b/backend/backend.proto
@@ -159,6 +159,8 @@ message Reply {
bytes message = 1;
int32 tokens = 2;
int32 prompt_tokens = 3;
+ double timing_prompt_processing = 4;
+ double timing_token_generation = 5;
}
message ModelOptions {
@@ -348,4 +350,4 @@ message StatusResponse {
message Message {
string role = 1;
string content = 2;
-}
\ No newline at end of file
+}
diff --git a/backend/cpp/llama/grpc-server.cpp b/backend/cpp/llama/grpc-server.cpp
index f0a16ffa..4e75e7b0 100644
--- a/backend/cpp/llama/grpc-server.cpp
+++ b/backend/cpp/llama/grpc-server.cpp
@@ -2408,6 +2408,13 @@ public:
int32_t tokens_evaluated = result.result_json.value("tokens_evaluated", 0);
reply.set_prompt_tokens(tokens_evaluated);
+ if (result.result_json.contains("timings")) {
+ double timing_prompt_processing = result.result_json.at("timings").value("prompt_ms", 0.0);
+ reply.set_timing_prompt_processing(timing_prompt_processing);
+ double timing_token_generation = result.result_json.at("timings").value("predicted_ms", 0.0);
+ reply.set_timing_token_generation(timing_token_generation);
+ }
+
// Log Request Correlation Id
LOG_VERBOSE("correlation:", {
{ "id", data["correlation_id"] }
@@ -2448,6 +2455,13 @@ public:
reply->set_prompt_tokens(tokens_evaluated);
reply->set_tokens(tokens_predicted);
reply->set_message(completion_text);
+
+ if (result.result_json.contains("timings")) {
+ double timing_prompt_processing = result.result_json.at("timings").value("prompt_ms", 0.0);
+ reply->set_timing_prompt_processing(timing_prompt_processing);
+ double timing_token_generation = result.result_json.at("timings").value("predicted_ms", 0.0);
+ reply->set_timing_token_generation(timing_token_generation);
+ }
}
else
{
diff --git a/core/backend/llm.go b/core/backend/llm.go
index 9a4d0d46..d91ded51 100644
--- a/core/backend/llm.go
+++ b/core/backend/llm.go
@@ -27,8 +27,10 @@ type LLMResponse struct {
}
type TokenUsage struct {
- Prompt int
- Completion int
+ Prompt int
+ Completion int
+ TimingPromptProcessing float64
+ TimingTokenGeneration float64
}
func ModelInference(ctx context.Context, s string, messages []schema.Message, images, videos, audios []string, loader *model.ModelLoader, c config.BackendConfig, o *config.ApplicationConfig, tokenCallback func(string, TokenUsage) bool) (func() (LLMResponse, error), error) {
@@ -123,6 +125,8 @@ func ModelInference(ctx context.Context, s string, messages []schema.Message, im
tokenUsage.Prompt = int(reply.PromptTokens)
tokenUsage.Completion = int(reply.Tokens)
+ tokenUsage.TimingTokenGeneration = reply.TimingTokenGeneration
+ tokenUsage.TimingPromptProcessing = reply.TimingPromptProcessing
for len(partialRune) > 0 {
r, size := utf8.DecodeRune(partialRune)
@@ -157,6 +161,10 @@ func ModelInference(ctx context.Context, s string, messages []schema.Message, im
if tokenUsage.Completion == 0 {
tokenUsage.Completion = int(reply.Tokens)
}
+
+ tokenUsage.TimingTokenGeneration = reply.TimingTokenGeneration
+ tokenUsage.TimingPromptProcessing = reply.TimingPromptProcessing
+
return LLMResponse{
Response: string(reply.Message),
Usage: tokenUsage,
diff --git a/core/cli/run.go b/core/cli/run.go
index a0e16155..b86fe2a6 100644
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -70,6 +70,7 @@ type RunCMD struct {
WatchdogBusyTimeout string `env:"LOCALAI_WATCHDOG_BUSY_TIMEOUT,WATCHDOG_BUSY_TIMEOUT" default:"5m" help:"Threshold beyond which a busy backend should be stopped" group:"backends"`
Federated bool `env:"LOCALAI_FEDERATED,FEDERATED" help:"Enable federated instance" group:"federated"`
DisableGalleryEndpoint bool `env:"LOCALAI_DISABLE_GALLERY_ENDPOINT,DISABLE_GALLERY_ENDPOINT" help:"Disable the gallery endpoints" group:"api"`
+ MachineTag string `env:"LOCALAI_MACHINE_TAG" help:"Add Machine-Tag header to each response which is useful to track the machine in the P2P network" group:"api"`
LoadToMemory []string `env:"LOCALAI_LOAD_TO_MEMORY,LOAD_TO_MEMORY" help:"A list of models to load into memory at startup" group:"models"`
}
@@ -107,6 +108,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
config.WithHttpGetExemptedEndpoints(r.HttpGetExemptedEndpoints),
config.WithP2PNetworkID(r.Peer2PeerNetworkID),
config.WithLoadToMemory(r.LoadToMemory),
+ config.WithMachineTag(r.MachineTag),
}
if r.DisableMetricsEndpoint {
diff --git a/core/config/application_config.go b/core/config/application_config.go
index 3f321e70..1ffcb297 100644
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@@ -65,6 +65,8 @@ type ApplicationConfig struct {
ModelsURL []string
WatchDogBusyTimeout, WatchDogIdleTimeout time.Duration
+
+ MachineTag string
}
type AppOption func(*ApplicationConfig)
@@ -94,6 +96,12 @@ func WithModelPath(path string) AppOption {
}
}
+func WithMachineTag(tag string) AppOption {
+ return func(o *ApplicationConfig) {
+ o.MachineTag = tag
+ }
+}
+
func WithCors(b bool) AppOption {
return func(o *ApplicationConfig) {
o.CORS = b
diff --git a/core/http/app.go b/core/http/app.go
index 47d89a10..d1e80f8d 100644
--- a/core/http/app.go
+++ b/core/http/app.go
@@ -89,6 +89,14 @@ func API(application *application.Application) (*fiber.App, error) {
router.Use(middleware.StripPathPrefix())
+ if application.ApplicationConfig().MachineTag != "" {
+ router.Use(func(c *fiber.Ctx) error {
+ c.Response().Header.Set("Machine-Tag", application.ApplicationConfig().MachineTag)
+
+ return c.Next()
+ })
+ }
+
router.Hooks().OnListen(func(listenData fiber.ListenData) error {
scheme := "http"
if listenData.TLS {
diff --git a/core/http/endpoints/localai/tts.go b/core/http/endpoints/localai/tts.go
index 7c73c633..9116f9fa 100644
--- a/core/http/endpoints/localai/tts.go
+++ b/core/http/endpoints/localai/tts.go
@@ -24,7 +24,6 @@ import (
// @Router /tts [post]
func TTSEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
return func(c *fiber.Ctx) error {
-
input := new(schema.TTSRequest)
// Get input data from the request body
diff --git a/core/http/endpoints/localai/vad.go b/core/http/endpoints/localai/vad.go
index c5a5d929..2ed6125c 100644
--- a/core/http/endpoints/localai/vad.go
+++ b/core/http/endpoints/localai/vad.go
@@ -19,7 +19,6 @@ import (
// @Router /vad [post]
func VADEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
return func(c *fiber.Ctx) error {
-
input := new(schema.VADRequest)
// Get input data from the request body
diff --git a/core/http/endpoints/openai/chat.go b/core/http/endpoints/openai/chat.go
index c2b201bd..cbce369a 100644
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -30,7 +30,7 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
var id, textContentToReturn string
var created int
- process := func(s string, req *schema.OpenAIRequest, config *config.BackendConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse) {
+ process := func(s string, req *schema.OpenAIRequest, config *config.BackendConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse, extraUsage bool) {
initialMessage := schema.OpenAIResponse{
ID: id,
Created: created,
@@ -40,18 +40,24 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
}
responses <- initialMessage
- ComputeChoices(req, s, config, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
+ ComputeChoices(req, s, config, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, tokenUsage backend.TokenUsage) bool {
+ usage := schema.OpenAIUsage{
+ PromptTokens: tokenUsage.Prompt,
+ CompletionTokens: tokenUsage.Completion,
+ TotalTokens: tokenUsage.Prompt + tokenUsage.Completion,
+ }
+ if extraUsage {
+ usage.TimingTokenGeneration = tokenUsage.TimingTokenGeneration
+ usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
+ }
+
resp := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: []schema.Choice{{Delta: &schema.Message{Content: &s}, Index: 0}},
Object: "chat.completion.chunk",
- Usage: schema.OpenAIUsage{
- PromptTokens: usage.Prompt,
- CompletionTokens: usage.Completion,
- TotalTokens: usage.Prompt + usage.Completion,
- },
+ Usage: usage,
}
responses <- resp
@@ -59,7 +65,7 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
})
close(responses)
}
- processTools := func(noAction string, prompt string, req *schema.OpenAIRequest, config *config.BackendConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse) {
+ processTools := func(noAction string, prompt string, req *schema.OpenAIRequest, config *config.BackendConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse, extraUsage bool) {
result := ""
_, tokenUsage, _ := ComputeChoices(req, prompt, config, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
result += s
@@ -90,6 +96,15 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
log.Error().Err(err).Msg("error handling question")
return
}
+ usage := schema.OpenAIUsage{
+ PromptTokens: tokenUsage.Prompt,
+ CompletionTokens: tokenUsage.Completion,
+ TotalTokens: tokenUsage.Prompt + tokenUsage.Completion,
+ }
+ if extraUsage {
+ usage.TimingTokenGeneration = tokenUsage.TimingTokenGeneration
+ usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
+ }
resp := schema.OpenAIResponse{
ID: id,
@@ -97,11 +112,7 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
Model: req.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: []schema.Choice{{Delta: &schema.Message{Content: &result}, Index: 0}},
Object: "chat.completion.chunk",
- Usage: schema.OpenAIUsage{
- PromptTokens: tokenUsage.Prompt,
- CompletionTokens: tokenUsage.Completion,
- TotalTokens: tokenUsage.Prompt + tokenUsage.Completion,
- },
+ Usage: usage,
}
responses <- resp
@@ -170,6 +181,9 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
}
c.Set("X-Correlation-ID", correlationID)
+ // Opt-in extra usage flag
+ extraUsage := c.Get("LocalAI-Extra-Usage", "") != ""
+
modelFile, input, err := readRequest(c, cl, ml, startupOptions, true)
if err != nil {
return fmt.Errorf("failed reading parameters from request:%w", err)
@@ -319,9 +333,9 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
responses := make(chan schema.OpenAIResponse)
if !shouldUseFn {
- go process(predInput, input, config, ml, responses)
+ go process(predInput, input, config, ml, responses, extraUsage)
} else {
- go processTools(noActionName, predInput, input, config, ml, responses)
+ go processTools(noActionName, predInput, input, config, ml, responses, extraUsage)
}
c.Context().SetBodyStreamWriter(fasthttp.StreamWriter(func(w *bufio.Writer) {
@@ -449,6 +463,15 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
if err != nil {
return err
}
+ usage := schema.OpenAIUsage{
+ PromptTokens: tokenUsage.Prompt,
+ CompletionTokens: tokenUsage.Completion,
+ TotalTokens: tokenUsage.Prompt + tokenUsage.Completion,
+ }
+ if extraUsage {
+ usage.TimingTokenGeneration = tokenUsage.TimingTokenGeneration
+ usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
+ }
resp := &schema.OpenAIResponse{
ID: id,
@@ -456,11 +479,7 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
Model: input.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: result,
Object: "chat.completion",
- Usage: schema.OpenAIUsage{
- PromptTokens: tokenUsage.Prompt,
- CompletionTokens: tokenUsage.Completion,
- TotalTokens: tokenUsage.Prompt + tokenUsage.Completion,
- },
+ Usage: usage,
}
respData, _ := json.Marshal(resp)
log.Debug().Msgf("Response: %s", respData)
diff --git a/core/http/endpoints/openai/completion.go b/core/http/endpoints/openai/completion.go
index 04ebc847..339e9bc2 100644
--- a/core/http/endpoints/openai/completion.go
+++ b/core/http/endpoints/openai/completion.go
@@ -30,8 +30,17 @@ func CompletionEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, e
id := uuid.New().String()
created := int(time.Now().Unix())
- process := func(s string, req *schema.OpenAIRequest, config *config.BackendConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse) {
- ComputeChoices(req, s, config, appConfig, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
+ process := func(s string, req *schema.OpenAIRequest, config *config.BackendConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse, extraUsage bool) {
+ ComputeChoices(req, s, config, appConfig, loader, func(s string, c *[]schema.Choice) {}, func(s string, tokenUsage backend.TokenUsage) bool {
+ usage := schema.OpenAIUsage{
+ PromptTokens: tokenUsage.Prompt,
+ CompletionTokens: tokenUsage.Completion,
+ TotalTokens: tokenUsage.Prompt + tokenUsage.Completion,
+ }
+ if extraUsage {
+ usage.TimingTokenGeneration = tokenUsage.TimingTokenGeneration
+ usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
+ }
resp := schema.OpenAIResponse{
ID: id,
Created: created,
@@ -43,11 +52,7 @@ func CompletionEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, e
},
},
Object: "text_completion",
- Usage: schema.OpenAIUsage{
- PromptTokens: usage.Prompt,
- CompletionTokens: usage.Completion,
- TotalTokens: usage.Prompt + usage.Completion,
- },
+ Usage: usage,
}
log.Debug().Msgf("Sending goroutine: %s", s)
@@ -60,6 +65,10 @@ func CompletionEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, e
return func(c *fiber.Ctx) error {
// Add Correlation
c.Set("X-Correlation-ID", id)
+
+ // Opt-in extra usage flag
+ extraUsage := c.Get("LocalAI-Extra-Usage", "") != ""
+
modelFile, input, err := readRequest(c, cl, ml, appConfig, true)
if err != nil {
return fmt.Errorf("failed reading parameters from request:%w", err)
@@ -113,7 +122,7 @@ func CompletionEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, e
responses := make(chan schema.OpenAIResponse)
- go process(predInput, input, config, ml, responses)
+ go process(predInput, input, config, ml, responses, extraUsage)
c.Context().SetBodyStreamWriter(fasthttp.StreamWriter(func(w *bufio.Writer) {
@@ -170,11 +179,20 @@ func CompletionEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, e
return err
}
- totalTokenUsage.Prompt += tokenUsage.Prompt
- totalTokenUsage.Completion += tokenUsage.Completion
+ totalTokenUsage.TimingTokenGeneration += tokenUsage.TimingTokenGeneration
+ totalTokenUsage.TimingPromptProcessing += tokenUsage.TimingPromptProcessing
result = append(result, r...)
}
+ usage := schema.OpenAIUsage{
+ PromptTokens: totalTokenUsage.Prompt,
+ CompletionTokens: totalTokenUsage.Completion,
+ TotalTokens: totalTokenUsage.Prompt + totalTokenUsage.Completion,
+ }
+ if extraUsage {
+ usage.TimingTokenGeneration = totalTokenUsage.TimingTokenGeneration
+ usage.TimingPromptProcessing = totalTokenUsage.TimingPromptProcessing
+ }
resp := &schema.OpenAIResponse{
ID: id,
@@ -182,11 +200,7 @@ func CompletionEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, e
Model: input.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: result,
Object: "text_completion",
- Usage: schema.OpenAIUsage{
- PromptTokens: totalTokenUsage.Prompt,
- CompletionTokens: totalTokenUsage.Completion,
- TotalTokens: totalTokenUsage.Prompt + totalTokenUsage.Completion,
- },
+ Usage: usage,
}
jsonResult, _ := json.Marshal(resp)
diff --git a/core/http/endpoints/openai/edit.go b/core/http/endpoints/openai/edit.go
index a6d609fb..e10a12d1 100644
--- a/core/http/endpoints/openai/edit.go
+++ b/core/http/endpoints/openai/edit.go
@@ -25,6 +25,9 @@ import (
func EditEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluator *templates.Evaluator, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
return func(c *fiber.Ctx) error {
+ // Opt-in extra usage flag
+ extraUsage := c.Get("LocalAI-Extra-Usage", "") != ""
+
modelFile, input, err := readRequest(c, cl, ml, appConfig, true)
if err != nil {
return fmt.Errorf("failed reading parameters from request:%w", err)
@@ -61,8 +64,20 @@ func EditEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
totalTokenUsage.Prompt += tokenUsage.Prompt
totalTokenUsage.Completion += tokenUsage.Completion
+ totalTokenUsage.TimingTokenGeneration += tokenUsage.TimingTokenGeneration
+ totalTokenUsage.TimingPromptProcessing += tokenUsage.TimingPromptProcessing
+
result = append(result, r...)
}
+ usage := schema.OpenAIUsage{
+ PromptTokens: totalTokenUsage.Prompt,
+ CompletionTokens: totalTokenUsage.Completion,
+ TotalTokens: totalTokenUsage.Prompt + totalTokenUsage.Completion,
+ }
+ if extraUsage {
+ usage.TimingTokenGeneration = totalTokenUsage.TimingTokenGeneration
+ usage.TimingPromptProcessing = totalTokenUsage.TimingPromptProcessing
+ }
id := uuid.New().String()
created := int(time.Now().Unix())
@@ -72,11 +87,7 @@ func EditEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
Model: input.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: result,
Object: "edit",
- Usage: schema.OpenAIUsage{
- PromptTokens: totalTokenUsage.Prompt,
- CompletionTokens: totalTokenUsage.Completion,
- TotalTokens: totalTokenUsage.Prompt + totalTokenUsage.Completion,
- },
+ Usage: usage,
}
jsonResult, _ := json.Marshal(resp)
diff --git a/core/http/endpoints/openai/inference.go b/core/http/endpoints/openai/inference.go
index da75d3a1..f59e3b60 100644
--- a/core/http/endpoints/openai/inference.go
+++ b/core/http/endpoints/openai/inference.go
@@ -52,6 +52,8 @@ func ComputeChoices(
tokenUsage.Prompt += prediction.Usage.Prompt
tokenUsage.Completion += prediction.Usage.Completion
+ tokenUsage.TimingPromptProcessing += prediction.Usage.TimingPromptProcessing
+ tokenUsage.TimingTokenGeneration += prediction.Usage.TimingTokenGeneration
finetunedResponse := backend.Finetune(*config, predInput, prediction.Response)
cb(finetunedResponse, &result)
diff --git a/core/http/endpoints/openai/list.go b/core/http/endpoints/openai/list.go
index 80dcb3e4..9d21f8fe 100644
--- a/core/http/endpoints/openai/list.go
+++ b/core/http/endpoints/openai/list.go
@@ -12,7 +12,7 @@ import (
// @Summary List and describe the various models available in the API.
// @Success 200 {object} schema.ModelsDataResponse "Response"
// @Router /v1/models [get]
-func ListModelsEndpoint(bcl *config.BackendConfigLoader, ml *model.ModelLoader) func(ctx *fiber.Ctx) error {
+func ListModelsEndpoint(bcl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(ctx *fiber.Ctx) error {
return func(c *fiber.Ctx) error {
// If blank, no filter is applied.
filter := c.Query("filter")
diff --git a/core/http/routes/openai.go b/core/http/routes/openai.go
index 5ff301b6..a48ced65 100644
--- a/core/http/routes/openai.go
+++ b/core/http/routes/openai.go
@@ -130,6 +130,6 @@ func RegisterOpenAIRoutes(app *fiber.App,
}
// List models
- app.Get("/v1/models", openai.ListModelsEndpoint(application.BackendLoader(), application.ModelLoader()))
- app.Get("/models", openai.ListModelsEndpoint(application.BackendLoader(), application.ModelLoader()))
+ app.Get("/v1/models", openai.ListModelsEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
+ app.Get("/models", openai.ListModelsEndpoint(application.BackendLoader(), application.ModelLoader(), application.ApplicationConfig()))
}
diff --git a/core/schema/openai.go b/core/schema/openai.go
index 15bcd13d..b06120ae 100644
--- a/core/schema/openai.go
+++ b/core/schema/openai.go
@@ -23,6 +23,9 @@ type OpenAIUsage struct {
PromptTokens int `json:"prompt_tokens"`
CompletionTokens int `json:"completion_tokens"`
TotalTokens int `json:"total_tokens"`
+ // Extra timing data, disabled by default as is't not a part of OpenAI specification
+ TimingPromptProcessing float64 `json:"timing_prompt_processing,omitempty"`
+ TimingTokenGeneration float64 `json:"timing_token_generation,omitempty"`
}
type Item struct {
From a761e01944b261e2181e5568bb263324d41218c5 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 17 Jan 2025 18:16:17 +0100
Subject: [PATCH 043/679] chore: alias transformers-musicgen to transformers
(#4623)
chore: alias transformers-muscigen to transformers
Signed-off-by: Ettore Di Giacinto
---
pkg/model/initializers.go | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/pkg/model/initializers.go b/pkg/model/initializers.go
index 3d03514a..f4675050 100644
--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@@ -26,6 +26,7 @@ var Aliases map[string]string = map[string]string{
"llama": LLamaCPP,
"embedded-store": LocalStoreBackend,
"langchain-huggingface": LCHuggingFaceBackend,
+ "transformers-musicgen": TransformersBackend,
}
var AutoDetect = os.Getenv("DISABLE_AUTODETECT") != "true"
@@ -51,7 +52,8 @@ const (
PiperBackend = "piper"
LCHuggingFaceBackend = "huggingface"
- LocalStoreBackend = "local-store"
+ TransformersBackend = "transformers"
+ LocalStoreBackend = "local-store"
)
func backendPath(assetDir, backend string) string {
From ee7904f170786df1ef30e1ddca04f432fb5ac1e6 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 17 Jan 2025 19:33:25 +0100
Subject: [PATCH 044/679] feat(transformers): add support to OuteTTS (#4622)
Signed-off-by: Ettore Di Giacinto
---
backend/python/transformers/backend.py | 68 +++++++++++++++++--
.../python/transformers/requirements-cpu.txt | 4 +-
.../transformers/requirements-cublas11.txt | 4 +-
.../transformers/requirements-cublas12.txt | 4 +-
.../transformers/requirements-hipblas.txt | 4 +-
.../transformers/requirements-intel.txt | 4 +-
backend/python/transformers/requirements.txt | 4 +-
7 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/backend/python/transformers/backend.py b/backend/python/transformers/backend.py
index 3f6838ad..27257934 100644
--- a/backend/python/transformers/backend.py
+++ b/backend/python/transformers/backend.py
@@ -24,7 +24,7 @@ XPU=os.environ.get("XPU", "0") == "1"
from transformers import AutoTokenizer, AutoModel, set_seed, TextIteratorStreamer, StoppingCriteriaList, StopStringCriteria
from transformers import AutoProcessor, MusicgenForConditionalGeneration
from scipy.io import wavfile
-
+import outetts
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
@@ -87,6 +87,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
self.CUDA = torch.cuda.is_available()
self.OV=False
+ self.OuteTTS=False
device_map="cpu"
@@ -195,7 +196,45 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
self.OV = True
elif request.Type == "MusicgenForConditionalGeneration":
self.processor = AutoProcessor.from_pretrained(model_name)
- self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
+ self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
+ elif request.Type == "OuteTTS":
+ options = request.Options
+ MODELNAME = "OuteAI/OuteTTS-0.3-1B"
+ TOKENIZER = "OuteAI/OuteTTS-0.3-1B"
+ VERSION = "0.3"
+ SPEAKER = "en_male_1"
+ for opt in options:
+ if opt.startswith("tokenizer:"):
+ TOKENIZER = opt.split(":")[1]
+ break
+ if opt.startswith("version:"):
+ VERSION = opt.split(":")[1]
+ break
+ if opt.startswith("speaker:"):
+ SPEAKER = opt.split(":")[1]
+ break
+
+ if model_name != "":
+ MODELNAME = model_name
+
+ # Configure the model
+ model_config = outetts.HFModelConfig_v2(
+ model_path=MODELNAME,
+ tokenizer_path=TOKENIZER
+ )
+ # Initialize the interface
+ self.interface = outetts.InterfaceHF(model_version=VERSION, cfg=model_config)
+ self.OuteTTS = True
+
+ self.interface.print_default_speakers()
+ if request.AudioPath:
+ if os.path.isabs(request.AudioPath):
+ self.AudioPath = request.AudioPath
+ else:
+ self.AudioPath = os.path.join(request.ModelPath, request.AudioPath)
+ self.speaker = self.interface.create_speaker(audio_path=self.AudioPath)
+ else:
+ self.speaker = self.interface.load_default_speaker(name=SPEAKER)
else:
print("Automodel", file=sys.stderr)
self.model = AutoModel.from_pretrained(model_name,
@@ -206,7 +245,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
torch_dtype=compute)
if request.ContextSize > 0:
self.max_tokens = request.ContextSize
- elif request.Type != "MusicgenForConditionalGeneration":
+ elif hasattr(self.model, 'config') and hasattr(self.model.config, 'max_position_embeddings'):
self.max_tokens = self.model.config.max_position_embeddings
else:
self.max_tokens = 512
@@ -445,9 +484,30 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
return backend_pb2.Result(success=True)
+ def OuteTTS(self, request, context):
+ try:
+ print("[OuteTTS] generating TTS", file=sys.stderr)
+ gen_cfg = outetts.GenerationConfig(
+ text="Speech synthesis is the artificial production of human speech.",
+ temperature=0.1,
+ repetition_penalty=1.1,
+ max_length=self.max_tokens,
+ speaker=self.speaker,
+ # voice_characteristics="upbeat enthusiasm, friendliness, clarity, professionalism, and trustworthiness"
+ )
+ output = self.interface.generate(config=gen_cfg)
+ print("[OuteTTS] Generated TTS", file=sys.stderr)
+ output.save(request.dst)
+ print("[OuteTTS] TTS done", file=sys.stderr)
+ except Exception as err:
+ return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
+ return backend_pb2.Result(success=True)
# The TTS endpoint is older, and provides fewer features, but exists for compatibility reasons
def TTS(self, request, context):
+ if self.OuteTTS:
+ return self.OuteTTS(request, context)
+
model_name = request.model
try:
if self.processor is None:
@@ -463,7 +523,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
padding=True,
return_tensors="pt",
)
- tokens = 512 # No good place to set the "length" in TTS, so use 10s as a sane default
+ tokens = self.max_tokens # No good place to set the "length" in TTS, so use 10s as a sane default
audio_values = self.model.generate(**inputs, max_new_tokens=tokens)
print("[transformers-musicgen] TTS generated!", file=sys.stderr)
sampling_rate = self.model.config.audio_encoder.sampling_rate
diff --git a/backend/python/transformers/requirements-cpu.txt b/backend/python/transformers/requirements-cpu.txt
index f99aa18f..56b77325 100644
--- a/backend/python/transformers/requirements-cpu.txt
+++ b/backend/python/transformers/requirements-cpu.txt
@@ -1,4 +1,6 @@
torch==2.4.1
+llvmlite==0.43.0
accelerate
transformers
-bitsandbytes
\ No newline at end of file
+bitsandbytes
+outetts
\ No newline at end of file
diff --git a/backend/python/transformers/requirements-cublas11.txt b/backend/python/transformers/requirements-cublas11.txt
index 2c1d0755..924b0086 100644
--- a/backend/python/transformers/requirements-cublas11.txt
+++ b/backend/python/transformers/requirements-cublas11.txt
@@ -1,5 +1,7 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118
+llvmlite==0.43.0
accelerate
transformers
-bitsandbytes
\ No newline at end of file
+bitsandbytes
+outetts
\ No newline at end of file
diff --git a/backend/python/transformers/requirements-cublas12.txt b/backend/python/transformers/requirements-cublas12.txt
index f99aa18f..0feb3d81 100644
--- a/backend/python/transformers/requirements-cublas12.txt
+++ b/backend/python/transformers/requirements-cublas12.txt
@@ -1,4 +1,6 @@
torch==2.4.1
accelerate
+llvmlite==0.43.0
transformers
-bitsandbytes
\ No newline at end of file
+bitsandbytes
+outetts
\ No newline at end of file
diff --git a/backend/python/transformers/requirements-hipblas.txt b/backend/python/transformers/requirements-hipblas.txt
index f9577fab..fa65fb8e 100644
--- a/backend/python/transformers/requirements-hipblas.txt
+++ b/backend/python/transformers/requirements-hipblas.txt
@@ -2,4 +2,6 @@
torch==2.4.1+rocm6.0
accelerate
transformers
-bitsandbytes
\ No newline at end of file
+llvmlite==0.43.0
+bitsandbytes
+outetts
\ No newline at end of file
diff --git a/backend/python/transformers/requirements-intel.txt b/backend/python/transformers/requirements-intel.txt
index dd683cd9..4a295599 100644
--- a/backend/python/transformers/requirements-intel.txt
+++ b/backend/python/transformers/requirements-intel.txt
@@ -3,5 +3,7 @@ intel-extension-for-pytorch==2.3.110+xpu
torch==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
+llvmlite==0.43.0
intel-extension-for-transformers
-bitsandbytes
\ No newline at end of file
+bitsandbytes
+outetts
\ No newline at end of file
diff --git a/backend/python/transformers/requirements.txt b/backend/python/transformers/requirements.txt
index 262dd17a..ba1d88e7 100644
--- a/backend/python/transformers/requirements.txt
+++ b/backend/python/transformers/requirements.txt
@@ -2,4 +2,6 @@ grpcio==1.69.0
protobuf
certifi
setuptools
-scipy==1.14.0
\ No newline at end of file
+scipy==1.14.0
+numpy>=2.0.0
+numba==0.60.0
\ No newline at end of file
From cbdbe59f164a06ad4e994444671b8dfdbcfc120f Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Fri, 17 Jan 2025 22:14:11 +0000
Subject: [PATCH 045/679] chore(deps): Bump scipy from 1.14.0 to 1.15.1 in
/backend/python/transformers (#4621)
chore(deps): Bump scipy in /backend/python/transformers
Bumps [scipy](https://github.com/scipy/scipy) from 1.14.0 to 1.15.1.
- [Release notes](https://github.com/scipy/scipy/releases)
- [Commits](https://github.com/scipy/scipy/compare/v1.14.0...v1.15.1)
---
updated-dependencies:
- dependency-name: scipy
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
backend/python/transformers/requirements.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/backend/python/transformers/requirements.txt b/backend/python/transformers/requirements.txt
index ba1d88e7..d353e4d0 100644
--- a/backend/python/transformers/requirements.txt
+++ b/backend/python/transformers/requirements.txt
@@ -2,6 +2,6 @@ grpcio==1.69.0
protobuf
certifi
setuptools
-scipy==1.14.0
+scipy==1.15.1
numpy>=2.0.0
numba==0.60.0
\ No newline at end of file
From 895cd7c76aa83b84f64b07802682e910a54b0d42 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 18 Jan 2025 08:57:49 +0100
Subject: [PATCH 046/679] feat(swagger): update swagger (#4625)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
swagger/docs.go | 7 +++++++
swagger/swagger.json | 7 +++++++
swagger/swagger.yaml | 6 ++++++
3 files changed, 20 insertions(+)
diff --git a/swagger/docs.go b/swagger/docs.go
index 1a5943c4..13a3d3f3 100644
--- a/swagger/docs.go
+++ b/swagger/docs.go
@@ -1752,6 +1752,13 @@ const docTemplate = `{
"prompt_tokens": {
"type": "integer"
},
+ "timing_prompt_processing": {
+ "description": "Extra timing data, disabled by default as is't not a part of OpenAI specification",
+ "type": "number"
+ },
+ "timing_token_generation": {
+ "type": "number"
+ },
"total_tokens": {
"type": "integer"
}
diff --git a/swagger/swagger.json b/swagger/swagger.json
index dc902e11..1c38e9da 100644
--- a/swagger/swagger.json
+++ b/swagger/swagger.json
@@ -1745,6 +1745,13 @@
"prompt_tokens": {
"type": "integer"
},
+ "timing_prompt_processing": {
+ "description": "Extra timing data, disabled by default as is't not a part of OpenAI specification",
+ "type": "number"
+ },
+ "timing_token_generation": {
+ "type": "number"
+ },
"total_tokens": {
"type": "integer"
}
diff --git a/swagger/swagger.yaml b/swagger/swagger.yaml
index a447f7cc..1692f4bb 100644
--- a/swagger/swagger.yaml
+++ b/swagger/swagger.yaml
@@ -646,6 +646,12 @@ definitions:
type: integer
prompt_tokens:
type: integer
+ timing_prompt_processing:
+ description: Extra timing data, disabled by default as is't not a part of
+ OpenAI specification
+ type: number
+ timing_token_generation:
+ type: number
total_tokens:
type: integer
type: object
From 96306a39a05894dee9ceb6a97f4215f45d359559 Mon Sep 17 00:00:00 2001
From: mintyleaf
Date: Sat, 18 Jan 2025 11:58:38 +0400
Subject: [PATCH 047/679] chore(docs): extra-Usage and Machine-Tag docs (#4627)
Rename LocalAI-Extra-Usage -> Extra-Usage, add MACHINE_TAG as cli flag option, add docs about extra-usage and machine-tag
Signed-off-by: mintyleaf
---
core/cli/run.go | 2 +-
core/http/endpoints/openai/chat.go | 2 +-
core/http/endpoints/openai/completion.go | 2 +-
core/http/endpoints/openai/edit.go | 2 +-
docs/content/docs/advanced/advanced-usage.md | 31 +++++++++++++++++++-
5 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/core/cli/run.go b/core/cli/run.go
index b86fe2a6..279ff94b 100644
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -70,7 +70,7 @@ type RunCMD struct {
WatchdogBusyTimeout string `env:"LOCALAI_WATCHDOG_BUSY_TIMEOUT,WATCHDOG_BUSY_TIMEOUT" default:"5m" help:"Threshold beyond which a busy backend should be stopped" group:"backends"`
Federated bool `env:"LOCALAI_FEDERATED,FEDERATED" help:"Enable federated instance" group:"federated"`
DisableGalleryEndpoint bool `env:"LOCALAI_DISABLE_GALLERY_ENDPOINT,DISABLE_GALLERY_ENDPOINT" help:"Disable the gallery endpoints" group:"api"`
- MachineTag string `env:"LOCALAI_MACHINE_TAG" help:"Add Machine-Tag header to each response which is useful to track the machine in the P2P network" group:"api"`
+ MachineTag string `env:"LOCALAI_MACHINE_TAG,MACHINE_TAG" help:"Add Machine-Tag header to each response which is useful to track the machine in the P2P network" group:"api"`
LoadToMemory []string `env:"LOCALAI_LOAD_TO_MEMORY,LOAD_TO_MEMORY" help:"A list of models to load into memory at startup" group:"models"`
}
diff --git a/core/http/endpoints/openai/chat.go b/core/http/endpoints/openai/chat.go
index cbce369a..3b8d3056 100644
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -182,7 +182,7 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
c.Set("X-Correlation-ID", correlationID)
// Opt-in extra usage flag
- extraUsage := c.Get("LocalAI-Extra-Usage", "") != ""
+ extraUsage := c.Get("Extra-Usage", "") != ""
modelFile, input, err := readRequest(c, cl, ml, startupOptions, true)
if err != nil {
diff --git a/core/http/endpoints/openai/completion.go b/core/http/endpoints/openai/completion.go
index 339e9bc2..a353a0a1 100644
--- a/core/http/endpoints/openai/completion.go
+++ b/core/http/endpoints/openai/completion.go
@@ -67,7 +67,7 @@ func CompletionEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, e
c.Set("X-Correlation-ID", id)
// Opt-in extra usage flag
- extraUsage := c.Get("LocalAI-Extra-Usage", "") != ""
+ extraUsage := c.Get("Extra-Usage", "") != ""
modelFile, input, err := readRequest(c, cl, ml, appConfig, true)
if err != nil {
diff --git a/core/http/endpoints/openai/edit.go b/core/http/endpoints/openai/edit.go
index e10a12d1..28a3597c 100644
--- a/core/http/endpoints/openai/edit.go
+++ b/core/http/endpoints/openai/edit.go
@@ -26,7 +26,7 @@ func EditEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
return func(c *fiber.Ctx) error {
// Opt-in extra usage flag
- extraUsage := c.Get("LocalAI-Extra-Usage", "") != ""
+ extraUsage := c.Get("Extra-Usage", "") != ""
modelFile, input, err := readRequest(c, cl, ml, appConfig, true)
if err != nil {
diff --git a/docs/content/docs/advanced/advanced-usage.md b/docs/content/docs/advanced/advanced-usage.md
index 35d3a2e4..dd9894ef 100644
--- a/docs/content/docs/advanced/advanced-usage.md
+++ b/docs/content/docs/advanced/advanced-usage.md
@@ -520,6 +520,7 @@ In the help text below, BASEPATH is the location that local-ai is being executed
| --upload-limit | 15 | Default upload-limit in MB | $LOCALAI_UPLOAD_LIMIT |
| --api-keys | API-KEYS,... | List of API Keys to enable API authentication. When this is set, all the requests must be authenticated with one of these API keys | $LOCALAI_API_KEY |
| --disable-welcome | | Disable welcome pages | $LOCALAI_DISABLE_WELCOME |
+| --machine-tag | | If not empty - put that string to Machine-Tag header in each response. Useful to track response from different machines using multiple P2P federated nodes | $LOCALAI_MACHINE_TAG |
#### Backend Flags
| Parameter | Default | Description | Environment Variable |
@@ -553,6 +554,34 @@ LOCALAI_MODELS_PATH=/mnt/storage/localai/models
LOCALAI_F16=true
```
+### Request headers
+
+You can use 'Extra-Usage' request header key presence ('Extra-Usage: true') to receive inference timings in milliseconds extending default OpenAI response model in the usage field:
+```
+...
+{
+ "id": "...",
+ "created": ...,
+ "model": "...",
+ "choices": [
+ {
+ ...
+ },
+ ...
+ ],
+ "object": "...",
+ "usage": {
+ "prompt_tokens": ...,
+ "completion_tokens": ...,
+ "total_tokens": ...,
+ // Extra-Usage header key will include these two float fields:
+ "timing_prompt_processing: ...,
+ "timing_token_generation": ...,
+ },
+}
+...
+```
+
### Extra backends
LocalAI can be extended with extra backends. The backends are implemented as `gRPC` services and can be written in any language. The container images that are built and published on [quay.io](https://quay.io/repository/go-skynet/local-ai?tab=tags) contain a set of images split in core and extra. By default Images bring all the dependencies and backends supported by LocalAI (we call those `extra` images). The `-core` images instead bring only the strictly necessary dependencies to run LocalAI without only a core set of backends.
@@ -616,4 +645,4 @@ Note that, for llama.cpp you need to set accordingly `LLAMACPP_PARALLEL` to the
LocalAI will automatically discover the CPU flagset available in your host and will use the most optimized version of the backends.
-If you want to disable this behavior, you can set `DISABLE_AUTODETECT` to `true` in the environment variables.
\ No newline at end of file
+If you want to disable this behavior, you can set `DISABLE_AUTODETECT` to `true` in the environment variables.
From 958f6eb722ca027699238543c10d443451745bb4 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 18 Jan 2025 11:55:13 +0100
Subject: [PATCH 048/679] chore(llama.cpp): update dependency (#4628)
Update to '3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6' and adapt to upstream changes
Signed-off-by: Ettore Di Giacinto
---
Makefile | 2 +-
backend/cpp/llama/grpc-server.cpp | 28 +++++++++++++++++++++++++++-
2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index 03468ffb..1f1ffb3e 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=4dbc8b9cb71876e005724f4e8f73a3544646bcf5
+CPPLLAMA_VERSION?=3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
diff --git a/backend/cpp/llama/grpc-server.cpp b/backend/cpp/llama/grpc-server.cpp
index 4e75e7b0..9aeb34db 100644
--- a/backend/cpp/llama/grpc-server.cpp
+++ b/backend/cpp/llama/grpc-server.cpp
@@ -134,6 +134,32 @@ static std::string tokens_to_output_formatted_string(const llama_context *ctx, c
return out;
}
+// Adds an RPC server
+// https://github.com/ggerganov/llama.cpp/compare/4dbc8b9cb71876e005724f4e8f73a3544646bcf5..3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6
+static void add_rpc_devices(std::string servers) {
+ auto rpc_servers = string_split(servers, ',');
+ if (rpc_servers.empty()) {
+ throw std::invalid_argument("no RPC servers specified");
+ }
+ ggml_backend_reg_t rpc_reg = ggml_backend_reg_by_name("RPC");
+ if (!rpc_reg) {
+ throw std::invalid_argument("failed to find RPC backend");
+ }
+ typedef ggml_backend_dev_t (*ggml_backend_rpc_add_device_t)(const char * endpoint);
+ ggml_backend_rpc_add_device_t ggml_backend_rpc_add_device_fn = (ggml_backend_rpc_add_device_t) ggml_backend_reg_get_proc_address(rpc_reg, "ggml_backend_rpc_add_device");
+ if (!ggml_backend_rpc_add_device_fn) {
+ throw std::invalid_argument("failed to find RPC device add function");
+ }
+ for (const auto & server : rpc_servers) {
+ ggml_backend_dev_t dev = ggml_backend_rpc_add_device_fn(server.c_str());
+ if (dev) {
+ ggml_backend_device_register(dev);
+ } else {
+ throw std::invalid_argument("failed to register RPC device");
+ }
+ }
+}
+
// convert a vector of completion_token_output to json
static json probs_vector_to_json(const llama_context *ctx, const std::vector &probs)
{
@@ -2282,7 +2308,7 @@ static void params_parse(const backend::ModelOptions* request,
const char *llama_grpc_servers = std::getenv("LLAMACPP_GRPC_SERVERS");
if (llama_grpc_servers != NULL) {
- params.rpc_servers = std::string(llama_grpc_servers);
+ add_rpc_devices(std::string(llama_grpc_servers));
}
// TODO: Add yarn
From 4bd8434ae02d934d2ceee56d0779dc149bbb8bc0 Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Sat, 18 Jan 2025 15:47:49 +0100
Subject: [PATCH 049/679] fix(docs): add missing `-core` suffix to sycl images
(#4630)
Signed-off-by: Gianluca Boiano
---
docs/content/docs/getting-started/container-images.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/content/docs/getting-started/container-images.md b/docs/content/docs/getting-started/container-images.md
index 25385f23..967fc28b 100644
--- a/docs/content/docs/getting-started/container-images.md
+++ b/docs/content/docs/getting-started/container-images.md
@@ -197,7 +197,7 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-sycl-f16` | `localai/localai:master-sycl-f16` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-intel-f16` | `localai/localai:latest-gpu-intel-f16` |
-| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16` | `localai/localai:{{< version >}}-sycl-f16` |
+| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16-core` | `localai/localai:{{< version >}}-sycl-f16-core` |
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16-ffmpeg` | `localai/localai:{{< version >}}-sycl-f16-ffmpeg` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16-ffmpeg-core` | `localai/localai:{{< version >}}-sycl-f16-ffmpeg-core` |
@@ -209,7 +209,7 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| --- | --- |-------------------------------------------------------------|
| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-sycl-f32` | `localai/localai:master-sycl-f32` |
| Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-intel-f32` | `localai/localai:latest-gpu-intel-f32` |
-| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f32` | `localai/localai:{{< version >}}-sycl-f32` |
+| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f32-core` | `localai/localai:{{< version >}}-sycl-f32-core` |
| Versioned image including FFMpeg| `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f32-ffmpeg` | `localai/localai:{{< version >}}-sycl-f32-ffmpeg` |
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-sycl-f32-ffmpeg-core` | `localai/localai:{{< version >}}-sycl-f32-ffmpeg-core` |
From 1e9bf19c8d4dff99c6c2cbcbddc4d50962c58a07 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 18 Jan 2025 18:30:30 +0100
Subject: [PATCH 050/679] feat(transformers): merge sentencetransformers
backend (#4624)
* merge sentencetransformers
Signed-off-by: Ettore Di Giacinto
* Add alias to silently redirect sentencetransformers to transformers
Signed-off-by: Ettore Di Giacinto
* Add alias also for transformers-musicgen
Signed-off-by: Ettore Di Giacinto
* Drop from makefile
Signed-off-by: Ettore Di Giacinto
* Move tests from sentencetransformers
Signed-off-by: Ettore Di Giacinto
* Remove sentencetransformers
Signed-off-by: Ettore Di Giacinto
* Remove tests from CI (part of transformers)
Signed-off-by: Ettore Di Giacinto
* Do not always try to load the tokenizer
Signed-off-by: Ettore Di Giacinto
* Adapt tests
Signed-off-by: Ettore Di Giacinto
* Fix typo
Signed-off-by: Ettore Di Giacinto
* Tiny adjustments
Signed-off-by: Ettore Di Giacinto
---------
Signed-off-by: Ettore Di Giacinto
---
.github/workflows/test-extra.yml | 24 ----
.github/workflows/test.yml | 3 +-
Dockerfile | 5 +-
Makefile | 15 +--
backend/python/sentencetransformers/Makefile | 31 -----
backend/python/sentencetransformers/README.md | 5 -
.../python/sentencetransformers/backend.py | 114 ------------------
.../python/sentencetransformers/install.sh | 14 ---
.../sentencetransformers/requirements-cpu.txt | 6 -
.../requirements-cublas11.txt | 5 -
.../requirements-cublas12.txt | 4 -
.../requirements-hipblas.txt | 5 -
.../requirements-intel.txt | 9 --
.../sentencetransformers/requirements.txt | 5 -
backend/python/sentencetransformers/run.sh | 4 -
backend/python/sentencetransformers/test.py | 81 -------------
backend/python/sentencetransformers/test.sh | 6 -
backend/python/transformers/backend.py | 38 ++++--
.../python/transformers/requirements-cpu.txt | 3 +-
.../transformers/requirements-cublas11.txt | 3 +-
.../transformers/requirements-cublas12.txt | 3 +-
.../transformers/requirements-hipblas.txt | 4 +-
.../transformers/requirements-intel.txt | 3 +-
backend/python/transformers/test.py | 36 ++++++
core/http/app_test.go | 2 +-
pkg/model/initializers.go | 28 ++++-
tests/models_fixtures/grpc.yaml | 2 +-
27 files changed, 104 insertions(+), 354 deletions(-)
delete mode 100644 backend/python/sentencetransformers/Makefile
delete mode 100644 backend/python/sentencetransformers/README.md
delete mode 100755 backend/python/sentencetransformers/backend.py
delete mode 100755 backend/python/sentencetransformers/install.sh
delete mode 100644 backend/python/sentencetransformers/requirements-cpu.txt
delete mode 100644 backend/python/sentencetransformers/requirements-cublas11.txt
delete mode 100644 backend/python/sentencetransformers/requirements-cublas12.txt
delete mode 100644 backend/python/sentencetransformers/requirements-hipblas.txt
delete mode 100644 backend/python/sentencetransformers/requirements-intel.txt
delete mode 100644 backend/python/sentencetransformers/requirements.txt
delete mode 100755 backend/python/sentencetransformers/run.sh
delete mode 100644 backend/python/sentencetransformers/test.py
delete mode 100755 backend/python/sentencetransformers/test.sh
diff --git a/.github/workflows/test-extra.yml b/.github/workflows/test-extra.yml
index eacd3ab0..e99ea516 100644
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -35,30 +35,6 @@ jobs:
run: |
make --jobs=5 --output-sync=target -C backend/python/transformers
make --jobs=5 --output-sync=target -C backend/python/transformers test
-
- tests-sentencetransformers:
- runs-on: ubuntu-latest
- steps:
- - name: Clone
- uses: actions/checkout@v4
- with:
- submodules: true
- - name: Dependencies
- run: |
- sudo apt-get update
- sudo apt-get install build-essential ffmpeg
- # Install UV
- curl -LsSf https://astral.sh/uv/install.sh | sh
- sudo apt-get install -y ca-certificates cmake curl patch python3-pip
- sudo apt-get install -y libopencv-dev
- pip install --user --no-cache-dir grpcio-tools==1.64.1
-
- - name: Test sentencetransformers
- run: |
- make --jobs=5 --output-sync=target -C backend/python/sentencetransformers
- make --jobs=5 --output-sync=target -C backend/python/sentencetransformers test
-
-
tests-rerankers:
runs-on: ubuntu-latest
steps:
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index ecef0569..0ee93afa 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -100,8 +100,7 @@ jobs:
# The python3-grpc-tools package in 22.04 is too old
pip install --user grpcio-tools
- sudo rm -rfv /usr/bin/conda || true
- PATH=$PATH:/opt/conda/bin make -C backend/python/sentencetransformers
+ make -C backend/python/transformers
# Pre-build piper before we start tests in order to have shared libraries in place
make sources/go-piper && \
diff --git a/Dockerfile b/Dockerfile
index 9fb07516..4ddc921d 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
RUN apt-get update && \
@@ -456,9 +456,6 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMA
if [[ ( "${EXTRA_BACKENDS}" =~ "openvoice" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/openvoice \
; fi && \
- if [[ ( "${EXTRA_BACKENDS}" =~ "sentencetransformers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
- make -C backend/python/sentencetransformers \
- ; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "exllama2" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/exllama2 \
; fi && \
diff --git a/Makefile b/Makefile
index 1f1ffb3e..faa82d6b 100644
--- a/Makefile
+++ b/Makefile
@@ -497,7 +497,7 @@ test: prepare test-models/testmodel.ggml grpcs
@echo 'Running tests'
export GO_TAGS="tts stablediffusion debug"
$(MAKE) prepare-test
- HUGGINGFACE_GRPC=$(abspath ./)/backend/python/sentencetransformers/run.sh TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
+ HUGGINGFACE_GRPC=$(abspath ./)/backend/python/transformers/run.sh TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!llama && !llama-gguf" --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
$(MAKE) test-llama
$(MAKE) test-llama-gguf
@@ -583,10 +583,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
-protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen sentencetransformers-protogen transformers-protogen parler-tts-protogen kokoro-protogen vllm-protogen openvoice-protogen
+protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen transformers-protogen parler-tts-protogen kokoro-protogen vllm-protogen openvoice-protogen
.PHONY: protogen-python-clean
-protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean sentencetransformers-protogen-clean rerankers-protogen-clean transformers-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean
+protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean rerankers-protogen-clean transformers-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -644,14 +644,6 @@ rerankers-protogen:
rerankers-protogen-clean:
$(MAKE) -C backend/python/rerankers protogen-clean
-.PHONY: sentencetransformers-protogen
-sentencetransformers-protogen:
- $(MAKE) -C backend/python/sentencetransformers protogen
-
-.PHONY: sentencetransformers-protogen-clean
-sentencetransformers-protogen-clean:
- $(MAKE) -C backend/python/sentencetransformers protogen-clean
-
.PHONY: transformers-protogen
transformers-protogen:
$(MAKE) -C backend/python/transformers protogen
@@ -701,7 +693,6 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/diffusers
$(MAKE) -C backend/python/vllm
$(MAKE) -C backend/python/mamba
- $(MAKE) -C backend/python/sentencetransformers
$(MAKE) -C backend/python/rerankers
$(MAKE) -C backend/python/transformers
$(MAKE) -C backend/python/parler-tts
diff --git a/backend/python/sentencetransformers/Makefile b/backend/python/sentencetransformers/Makefile
deleted file mode 100644
index 8b18e943..00000000
--- a/backend/python/sentencetransformers/Makefile
+++ /dev/null
@@ -1,31 +0,0 @@
-.PHONY: sentencetransformers
-sentencetransformers: protogen
- bash ./install.sh
-
-
-.PHONY: run
-run: protogen
- @echo "Running sentencetransformers..."
- bash run.sh
- @echo "sentencetransformers run."
-
-# It is not working well by using command line. It only6 works with IDE like VSCode.
-.PHONY: test
-test: protogen
- @echo "Testing sentencetransformers..."
- bash test.sh
- @echo "sentencetransformers tested."
-
-.PHONY: protogen
-protogen: backend_pb2_grpc.py backend_pb2.py
-
-.PHONY: protogen-clean
-protogen-clean:
- $(RM) backend_pb2_grpc.py backend_pb2.py
-
-backend_pb2_grpc.py backend_pb2.py:
- python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
-
-.PHONY: clean
-clean: protogen-clean
- rm -rf venv __pycache__
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/README.md b/backend/python/sentencetransformers/README.md
deleted file mode 100644
index 829cf0d1..00000000
--- a/backend/python/sentencetransformers/README.md
+++ /dev/null
@@ -1,5 +0,0 @@
-# Creating a separate environment for the sentencetransformers project
-
-```
-make sentencetransformers
-```
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/backend.py b/backend/python/sentencetransformers/backend.py
deleted file mode 100755
index 2a20bf60..00000000
--- a/backend/python/sentencetransformers/backend.py
+++ /dev/null
@@ -1,114 +0,0 @@
-#!/usr/bin/env python3
-"""
-Extra gRPC server for HuggingFace SentenceTransformer models.
-"""
-from concurrent import futures
-
-import argparse
-import signal
-import sys
-import os
-
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-from sentence_transformers import SentenceTransformer
-
-_ONE_DAY_IN_SECONDS = 60 * 60 * 24
-
-# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
-MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
-
-# Implement the BackendServicer class with the service methods
-class BackendServicer(backend_pb2_grpc.BackendServicer):
- """
- A gRPC servicer for the backend service.
-
- This class implements the gRPC methods for the backend service, including Health, LoadModel, and Embedding.
- """
- def Health(self, request, context):
- """
- A gRPC method that returns the health status of the backend service.
-
- Args:
- request: A HealthRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- A Reply object that contains the health status of the backend service.
- """
- return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
-
- def LoadModel(self, request, context):
- """
- A gRPC method that loads a model into memory.
-
- Args:
- request: A LoadModelRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- A Result object that contains the result of the LoadModel operation.
- """
- model_name = request.Model
- try:
- self.model = SentenceTransformer(model_name, trust_remote_code=request.TrustRemoteCode)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
-
- # Implement your logic here for the LoadModel service
- # Replace this with your desired response
- return backend_pb2.Result(message="Model loaded successfully", success=True)
-
- def Embedding(self, request, context):
- """
- A gRPC method that calculates embeddings for a given sentence.
-
- Args:
- request: An EmbeddingRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- An EmbeddingResult object that contains the calculated embeddings.
- """
- # Implement your logic here for the Embedding service
- # Replace this with your desired response
- print("Calculated embeddings for: " + request.Embeddings, file=sys.stderr)
- sentence_embeddings = self.model.encode(request.Embeddings)
- return backend_pb2.EmbeddingResult(embeddings=sentence_embeddings)
-
-
-def serve(address):
- server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
- backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
- server.add_insecure_port(address)
- server.start()
- print("Server started. Listening on: " + address, file=sys.stderr)
-
- # Define the signal handler function
- def signal_handler(sig, frame):
- print("Received termination signal. Shutting down...")
- server.stop(0)
- sys.exit(0)
-
- # Set the signal handlers for SIGINT and SIGTERM
- signal.signal(signal.SIGINT, signal_handler)
- signal.signal(signal.SIGTERM, signal_handler)
-
- try:
- while True:
- time.sleep(_ONE_DAY_IN_SECONDS)
- except KeyboardInterrupt:
- server.stop(0)
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser(description="Run the gRPC server.")
- parser.add_argument(
- "--addr", default="localhost:50051", help="The address to bind the server to."
- )
- args = parser.parse_args()
-
- serve(args.addr)
diff --git a/backend/python/sentencetransformers/install.sh b/backend/python/sentencetransformers/install.sh
deleted file mode 100755
index 36443ef1..00000000
--- a/backend/python/sentencetransformers/install.sh
+++ /dev/null
@@ -1,14 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
-# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
-# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
-# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
-if [ "x${BUILD_PROFILE}" == "xintel" ]; then
- EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
-fi
-
-installRequirements
diff --git a/backend/python/sentencetransformers/requirements-cpu.txt b/backend/python/sentencetransformers/requirements-cpu.txt
deleted file mode 100644
index 1e23f68c..00000000
--- a/backend/python/sentencetransformers/requirements-cpu.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-torch==2.4.1
-accelerate
-transformers
-bitsandbytes
-sentence-transformers==3.3.1
-transformers
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/requirements-cublas11.txt b/backend/python/sentencetransformers/requirements-cublas11.txt
deleted file mode 100644
index 3900aba9..00000000
--- a/backend/python/sentencetransformers/requirements-cublas11.txt
+++ /dev/null
@@ -1,5 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/cu118
-torch==2.4.1+cu118
-accelerate
-sentence-transformers==3.3.1
-transformers
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/requirements-cublas12.txt b/backend/python/sentencetransformers/requirements-cublas12.txt
deleted file mode 100644
index 2afd0520..00000000
--- a/backend/python/sentencetransformers/requirements-cublas12.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-torch==2.4.1
-accelerate
-sentence-transformers==3.3.1
-transformers
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/requirements-hipblas.txt b/backend/python/sentencetransformers/requirements-hipblas.txt
deleted file mode 100644
index b472d371..00000000
--- a/backend/python/sentencetransformers/requirements-hipblas.txt
+++ /dev/null
@@ -1,5 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/rocm6.0
-torch==2.4.1+rocm6.0
-accelerate
-sentence-transformers==3.3.1
-transformers
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/requirements-intel.txt b/backend/python/sentencetransformers/requirements-intel.txt
deleted file mode 100644
index e9b72aab..00000000
--- a/backend/python/sentencetransformers/requirements-intel.txt
+++ /dev/null
@@ -1,9 +0,0 @@
---extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-intel-extension-for-pytorch==2.3.110+xpu
-torch==2.3.1+cxx11.abi
-oneccl_bind_pt==2.3.100+xpu
-optimum[openvino]
-setuptools
-accelerate
-sentence-transformers==3.3.1
-transformers
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/requirements.txt b/backend/python/sentencetransformers/requirements.txt
deleted file mode 100644
index 6e03c63f..00000000
--- a/backend/python/sentencetransformers/requirements.txt
+++ /dev/null
@@ -1,5 +0,0 @@
-grpcio==1.69.0
-protobuf
-certifi
-datasets
-einops
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/run.sh b/backend/python/sentencetransformers/run.sh
deleted file mode 100755
index 375c07e5..00000000
--- a/backend/python/sentencetransformers/run.sh
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/bin/bash
-source $(dirname $0)/../common/libbackend.sh
-
-startBackend $@
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/test.py b/backend/python/sentencetransformers/test.py
deleted file mode 100644
index 9df52b14..00000000
--- a/backend/python/sentencetransformers/test.py
+++ /dev/null
@@ -1,81 +0,0 @@
-"""
-A test script to test the gRPC service
-"""
-import unittest
-import subprocess
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-
-class TestBackendServicer(unittest.TestCase):
- """
- TestBackendServicer is the class that tests the gRPC service
- """
- def setUp(self):
- """
- This method sets up the gRPC service by starting the server
- """
- self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
- time.sleep(10)
-
- def tearDown(self) -> None:
- """
- This method tears down the gRPC service by terminating the server
- """
- self.service.kill()
- self.service.wait()
-
- def test_server_startup(self):
- """
- This method tests if the server starts up successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.Health(backend_pb2.HealthMessage())
- self.assertEqual(response.message, b'OK')
- except Exception as err:
- print(err)
- self.fail("Server failed to start")
- finally:
- self.tearDown()
-
- def test_load_model(self):
- """
- This method tests if the model is loaded successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="bert-base-nli-mean-tokens"))
- self.assertTrue(response.success)
- self.assertEqual(response.message, "Model loaded successfully")
- except Exception as err:
- print(err)
- self.fail("LoadModel service failed")
- finally:
- self.tearDown()
-
- def test_embedding(self):
- """
- This method tests if the embeddings are generated successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="bert-base-nli-mean-tokens"))
- self.assertTrue(response.success)
- embedding_request = backend_pb2.PredictOptions(Embeddings="This is a test sentence.")
- embedding_response = stub.Embedding(embedding_request)
- self.assertIsNotNone(embedding_response.embeddings)
- except Exception as err:
- print(err)
- self.fail("Embedding service failed")
- finally:
- self.tearDown()
\ No newline at end of file
diff --git a/backend/python/sentencetransformers/test.sh b/backend/python/sentencetransformers/test.sh
deleted file mode 100755
index 6940b066..00000000
--- a/backend/python/sentencetransformers/test.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-runUnittests
diff --git a/backend/python/transformers/backend.py b/backend/python/transformers/backend.py
index 27257934..9b65c6db 100644
--- a/backend/python/transformers/backend.py
+++ b/backend/python/transformers/backend.py
@@ -25,6 +25,8 @@ from transformers import AutoTokenizer, AutoModel, set_seed, TextIteratorStreame
from transformers import AutoProcessor, MusicgenForConditionalGeneration
from scipy.io import wavfile
import outetts
+from sentence_transformers import SentenceTransformer
+
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
@@ -88,10 +90,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
self.CUDA = torch.cuda.is_available()
self.OV=False
self.OuteTTS=False
+ self.SentenceTransformer = False
device_map="cpu"
quantization = None
+ autoTokenizer = True
if self.CUDA:
from transformers import BitsAndBytesConfig, AutoModelForCausalLM
@@ -195,9 +199,11 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
device=device_map)
self.OV = True
elif request.Type == "MusicgenForConditionalGeneration":
+ autoTokenizer = False
self.processor = AutoProcessor.from_pretrained(model_name)
self.model = MusicgenForConditionalGeneration.from_pretrained(model_name)
elif request.Type == "OuteTTS":
+ autoTokenizer = False
options = request.Options
MODELNAME = "OuteAI/OuteTTS-0.3-1B"
TOKENIZER = "OuteAI/OuteTTS-0.3-1B"
@@ -235,6 +241,10 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
self.speaker = self.interface.create_speaker(audio_path=self.AudioPath)
else:
self.speaker = self.interface.load_default_speaker(name=SPEAKER)
+ elif request.Type == "SentenceTransformer":
+ autoTokenizer = False
+ self.model = SentenceTransformer(model_name, trust_remote_code=request.TrustRemoteCode)
+ self.SentenceTransformer = True
else:
print("Automodel", file=sys.stderr)
self.model = AutoModel.from_pretrained(model_name,
@@ -250,7 +260,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
else:
self.max_tokens = 512
- if request.Type != "MusicgenForConditionalGeneration":
+ if autoTokenizer:
self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_safetensors=True)
self.XPU = False
@@ -286,18 +296,26 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
max_length = 512
if request.Tokens != 0:
max_length = request.Tokens
- encoded_input = self.tokenizer(request.Embeddings, padding=True, truncation=True, max_length=max_length, return_tensors="pt")
- # Create word embeddings
- if self.CUDA:
- encoded_input = encoded_input.to("cuda")
+ embeds = None
- with torch.no_grad():
- model_output = self.model(**encoded_input)
+ if self.SentenceTransformer:
+ print("Calculated embeddings for: " + request.Embeddings, file=sys.stderr)
+ embeds = self.model.encode(request.Embeddings)
+ else:
+ encoded_input = self.tokenizer(request.Embeddings, padding=True, truncation=True, max_length=max_length, return_tensors="pt")
- # Pool to get sentence embeddings; i.e. generate one 1024 vector for the entire sentence
- sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
- return backend_pb2.EmbeddingResult(embeddings=sentence_embeddings[0])
+ # Create word embeddings
+ if self.CUDA:
+ encoded_input = encoded_input.to("cuda")
+
+ with torch.no_grad():
+ model_output = self.model(**encoded_input)
+
+ # Pool to get sentence embeddings; i.e. generate one 1024 vector for the entire sentence
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+ embeds = sentence_embeddings[0]
+ return backend_pb2.EmbeddingResult(embeddings=embeds)
async def _predict(self, request, context, streaming=False):
set_seed(request.Seed)
diff --git a/backend/python/transformers/requirements-cpu.txt b/backend/python/transformers/requirements-cpu.txt
index 56b77325..421c4b80 100644
--- a/backend/python/transformers/requirements-cpu.txt
+++ b/backend/python/transformers/requirements-cpu.txt
@@ -3,4 +3,5 @@ llvmlite==0.43.0
accelerate
transformers
bitsandbytes
-outetts
\ No newline at end of file
+outetts
+sentence-transformers==3.3.1
diff --git a/backend/python/transformers/requirements-cublas11.txt b/backend/python/transformers/requirements-cublas11.txt
index 924b0086..c5d18d09 100644
--- a/backend/python/transformers/requirements-cublas11.txt
+++ b/backend/python/transformers/requirements-cublas11.txt
@@ -4,4 +4,5 @@ llvmlite==0.43.0
accelerate
transformers
bitsandbytes
-outetts
\ No newline at end of file
+outetts
+sentence-transformers==3.3.1
diff --git a/backend/python/transformers/requirements-cublas12.txt b/backend/python/transformers/requirements-cublas12.txt
index 0feb3d81..c0bcfc87 100644
--- a/backend/python/transformers/requirements-cublas12.txt
+++ b/backend/python/transformers/requirements-cublas12.txt
@@ -3,4 +3,5 @@ accelerate
llvmlite==0.43.0
transformers
bitsandbytes
-outetts
\ No newline at end of file
+outetts
+sentence-transformers==3.3.1
diff --git a/backend/python/transformers/requirements-hipblas.txt b/backend/python/transformers/requirements-hipblas.txt
index fa65fb8e..e7f53860 100644
--- a/backend/python/transformers/requirements-hipblas.txt
+++ b/backend/python/transformers/requirements-hipblas.txt
@@ -4,4 +4,6 @@ accelerate
transformers
llvmlite==0.43.0
bitsandbytes
-outetts
\ No newline at end of file
+outetts
+bitsandbytes
+sentence-transformers==3.3.1
diff --git a/backend/python/transformers/requirements-intel.txt b/backend/python/transformers/requirements-intel.txt
index 4a295599..aada6e00 100644
--- a/backend/python/transformers/requirements-intel.txt
+++ b/backend/python/transformers/requirements-intel.txt
@@ -6,4 +6,5 @@ optimum[openvino]
llvmlite==0.43.0
intel-extension-for-transformers
bitsandbytes
-outetts
\ No newline at end of file
+outetts
+sentence-transformers==3.3.1
diff --git a/backend/python/transformers/test.py b/backend/python/transformers/test.py
index 305b0a93..14efa6a7 100644
--- a/backend/python/transformers/test.py
+++ b/backend/python/transformers/test.py
@@ -133,5 +133,41 @@ class TestBackendServicer(unittest.TestCase):
except Exception as err:
print(err)
self.fail("SoundGeneration service failed")
+ finally:
+ self.tearDown()
+
+ def test_embed_load_model(self):
+ """
+ This method tests if the model is loaded successfully
+ """
+ try:
+ self.setUp()
+ with grpc.insecure_channel("localhost:50051") as channel:
+ stub = backend_pb2_grpc.BackendStub(channel)
+ response = stub.LoadModel(backend_pb2.ModelOptions(Model="bert-base-nli-mean-tokens",Type="SentenceTransformer"))
+ self.assertTrue(response.success)
+ self.assertEqual(response.message, "Model loaded successfully")
+ except Exception as err:
+ print(err)
+ self.fail("LoadModel service failed")
+ finally:
+ self.tearDown()
+
+ def test_sentencetransformers_embedding(self):
+ """
+ This method tests if the embeddings are generated successfully
+ """
+ try:
+ self.setUp()
+ with grpc.insecure_channel("localhost:50051") as channel:
+ stub = backend_pb2_grpc.BackendStub(channel)
+ response = stub.LoadModel(backend_pb2.ModelOptions(Model="bert-base-nli-mean-tokens",Type="SentenceTransformer"))
+ self.assertTrue(response.success)
+ embedding_request = backend_pb2.PredictOptions(Embeddings="This is a test sentence.")
+ embedding_response = stub.Embedding(embedding_request)
+ self.assertIsNotNone(embedding_response.embeddings)
+ except Exception as err:
+ print(err)
+ self.fail("Embedding service failed")
finally:
self.tearDown()
\ No newline at end of file
diff --git a/core/http/app_test.go b/core/http/app_test.go
index 6bf1806b..a2e2f758 100644
--- a/core/http/app_test.go
+++ b/core/http/app_test.go
@@ -822,7 +822,7 @@ var _ = Describe("API test", func() {
application, err := application.New(
append(commonOpts,
- config.WithExternalBackend("huggingface", os.Getenv("HUGGINGFACE_GRPC")),
+ config.WithExternalBackend("transformers", os.Getenv("HUGGINGFACE_GRPC")),
config.WithContext(c),
config.WithModelPath(modelPath),
)...)
diff --git a/pkg/model/initializers.go b/pkg/model/initializers.go
index f4675050..eb3e4fdf 100644
--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@@ -22,11 +22,19 @@ import (
)
var Aliases map[string]string = map[string]string{
- "go-llama": LLamaCPP,
- "llama": LLamaCPP,
- "embedded-store": LocalStoreBackend,
- "langchain-huggingface": LCHuggingFaceBackend,
- "transformers-musicgen": TransformersBackend,
+ "go-llama": LLamaCPP,
+ "llama": LLamaCPP,
+ "embedded-store": LocalStoreBackend,
+ "huggingface-embeddings": TransformersBackend,
+ "langchain-huggingface": LCHuggingFaceBackend,
+ "transformers-musicgen": TransformersBackend,
+ "sentencetransformers": TransformersBackend,
+}
+
+var TypeAlias map[string]string = map[string]string{
+ "sentencetransformers": "SentenceTransformer",
+ "huggingface-embeddings": "SentenceTransformer",
+ "transformers-musicgen": "MusicgenForConditionalGeneration",
}
var AutoDetect = os.Getenv("DISABLE_AUTODETECT") != "true"
@@ -396,6 +404,7 @@ func (ml *ModelLoader) grpcModel(backend string, autodetect bool, o *Options) fu
}
log.Debug().Msgf("Wait for the service to start up")
+ log.Debug().Msgf("Options: %+v", o.gRPCOptions)
// Wait for the service to start up
ready := false
@@ -460,8 +469,15 @@ func (ml *ModelLoader) backendLoader(opts ...Option) (client grpc.Backend, err e
backend := strings.ToLower(o.backendString)
if realBackend, exists := Aliases[backend]; exists {
+ typeAlias, exists := TypeAlias[backend]
+ if exists {
+ log.Debug().Msgf("'%s' is a type alias of '%s' (%s)", backend, realBackend, typeAlias)
+ o.gRPCOptions.Type = typeAlias
+ } else {
+ log.Debug().Msgf("'%s' is an alias of '%s'", backend, realBackend)
+ }
+
backend = realBackend
- log.Debug().Msgf("%s is an alias of %s", backend, realBackend)
}
ml.stopActiveBackends(o.modelID, o.singleActiveBackend)
diff --git a/tests/models_fixtures/grpc.yaml b/tests/models_fixtures/grpc.yaml
index 31c406ab..8c519920 100644
--- a/tests/models_fixtures/grpc.yaml
+++ b/tests/models_fixtures/grpc.yaml
@@ -1,5 +1,5 @@
name: code-search-ada-code-001
-backend: huggingface
+backend: sentencetransformers
embeddings: true
parameters:
model: all-MiniLM-L6-v2
\ No newline at end of file
From 032a33de49b3dbe2c3acfd684b6855a7ce0e36f7 Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Sat, 18 Jan 2025 18:35:30 +0100
Subject: [PATCH 051/679] chore: remove deprecated tinydream backend (#4631)
Signed-off-by: Gianluca Boiano
---
Makefile | 34 +----------------
backend/go/image/tinydream/main.go | 21 -----------
backend/go/image/tinydream/tinydream.go | 32 ----------------
core/config/backend_config.go | 2 +-
core/http/endpoints/openai/image.go | 2 -
docs/content/docs/getting-started/build.md | 4 +-
.../docs/getting-started/container-images.md | 2 +-
.../docs/reference/compatibility-table.md | 2 +-
gallery/index.yaml | 9 -----
gallery/tinydream.yaml | 37 -------------------
go.mod | 1 -
go.sum | 2 -
pkg/model/initializers.go | 1 -
pkg/tinydream/generate.go | 36 ------------------
pkg/tinydream/generate_unsupported.go | 10 -----
pkg/tinydream/tinydream.go | 20 ----------
16 files changed, 6 insertions(+), 209 deletions(-)
delete mode 100644 backend/go/image/tinydream/main.go
delete mode 100644 backend/go/image/tinydream/tinydream.go
delete mode 100644 gallery/tinydream.yaml
delete mode 100644 pkg/tinydream/generate.go
delete mode 100644 pkg/tinydream/generate_unsupported.go
delete mode 100644 pkg/tinydream/tinydream.go
diff --git a/Makefile b/Makefile
index faa82d6b..944cad37 100644
--- a/Makefile
+++ b/Makefile
@@ -22,10 +22,6 @@ PIPER_VERSION?=e10ca041a885d4a8f3871d52924b47792d5e5aa0
STABLEDIFFUSION_REPO?=https://github.com/mudler/go-stable-diffusion
STABLEDIFFUSION_VERSION?=4a3cd6aeae6f66ee57eae9a0075f8c58c3a6a38f
-# tinydream version
-TINYDREAM_REPO?=https://github.com/M0Rf30/go-tiny-dream
-TINYDREAM_VERSION?=c04fa463ace9d9a6464313aa5f9cd0f953b6c057
-
# bark.cpp
BARKCPP_REPO?=https://github.com/PABannier/bark.cpp.git
BARKCPP_VERSION?=v1.0.0
@@ -188,11 +184,6 @@ ifeq ($(findstring stablediffusion,$(GO_TAGS)),stablediffusion)
OPTIONAL_GRPC+=backend-assets/grpc/stablediffusion
endif
-ifeq ($(findstring tinydream,$(GO_TAGS)),tinydream)
-# OPTIONAL_TARGETS+=go-tiny-dream/libtinydream.a
- OPTIONAL_GRPC+=backend-assets/grpc/tinydream
-endif
-
ifeq ($(findstring tts,$(GO_TAGS)),tts)
# OPTIONAL_TARGETS+=go-piper/libpiper_binding.a
# OPTIONAL_TARGETS+=backend-assets/espeak-ng-data
@@ -327,19 +318,6 @@ else
mv backend-assets/lib/libonnxruntime.so.$(ONNX_VERSION) backend-assets/lib/libonnxruntime.so.1
endif
-## tiny-dream
-sources/go-tiny-dream:
- mkdir -p sources/go-tiny-dream
- cd sources/go-tiny-dream && \
- git init && \
- git remote add origin $(TINYDREAM_REPO) && \
- git fetch origin && \
- git checkout $(TINYDREAM_VERSION) && \
- git submodule update --init --recursive --depth 1 --single-branch
-
-sources/go-tiny-dream/libtinydream.a: sources/go-tiny-dream
- $(MAKE) -C sources/go-tiny-dream libtinydream.a
-
## whisper
sources/whisper.cpp:
mkdir -p sources/whisper.cpp
@@ -353,12 +331,11 @@ sources/whisper.cpp:
sources/whisper.cpp/libwhisper.a: sources/whisper.cpp
cd sources/whisper.cpp && $(MAKE) libwhisper.a libggml.a
-get-sources: sources/go-llama.cpp sources/go-piper sources/stablediffusion-ggml.cpp sources/bark.cpp sources/whisper.cpp sources/go-stable-diffusion sources/go-tiny-dream backend/cpp/llama/llama.cpp
+get-sources: sources/go-llama.cpp sources/go-piper sources/stablediffusion-ggml.cpp sources/bark.cpp sources/whisper.cpp sources/go-stable-diffusion backend/cpp/llama/llama.cpp
replace:
$(GOCMD) mod edit -replace github.com/ggerganov/whisper.cpp=$(CURDIR)/sources/whisper.cpp
$(GOCMD) mod edit -replace github.com/ggerganov/whisper.cpp/bindings/go=$(CURDIR)/sources/whisper.cpp/bindings/go
- $(GOCMD) mod edit -replace github.com/M0Rf30/go-tiny-dream=$(CURDIR)/sources/go-tiny-dream
$(GOCMD) mod edit -replace github.com/mudler/go-piper=$(CURDIR)/sources/go-piper
$(GOCMD) mod edit -replace github.com/mudler/go-stable-diffusion=$(CURDIR)/sources/go-stable-diffusion
$(GOCMD) mod edit -replace github.com/go-skynet/go-llama.cpp=$(CURDIR)/sources/go-llama.cpp
@@ -366,7 +343,6 @@ replace:
dropreplace:
$(GOCMD) mod edit -dropreplace github.com/ggerganov/whisper.cpp
$(GOCMD) mod edit -dropreplace github.com/ggerganov/whisper.cpp/bindings/go
- $(GOCMD) mod edit -dropreplace github.com/M0Rf30/go-tiny-dream
$(GOCMD) mod edit -dropreplace github.com/mudler/go-piper
$(GOCMD) mod edit -dropreplace github.com/mudler/go-stable-diffusion
$(GOCMD) mod edit -dropreplace github.com/go-skynet/go-llama.cpp
@@ -381,7 +357,6 @@ rebuild: ## Rebuilds the project
$(MAKE) -C sources/whisper.cpp clean
$(MAKE) -C sources/go-stable-diffusion clean
$(MAKE) -C sources/go-piper clean
- $(MAKE) -C sources/go-tiny-dream clean
$(MAKE) build
prepare: prepare-sources $(OPTIONAL_TARGETS)
@@ -855,13 +830,6 @@ ifneq ($(UPX),)
$(UPX) backend-assets/grpc/silero-vad
endif
-backend-assets/grpc/tinydream: sources/go-tiny-dream sources/go-tiny-dream/libtinydream.a backend-assets/grpc
- CGO_LDFLAGS="$(CGO_LDFLAGS)" LIBRARY_PATH=$(CURDIR)/go-tiny-dream \
- $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/tinydream ./backend/go/image/tinydream
-ifneq ($(UPX),)
- $(UPX) backend-assets/grpc/tinydream
-endif
-
backend-assets/grpc/whisper: sources/whisper.cpp sources/whisper.cpp/libwhisper.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS) $(CGO_LDFLAGS_WHISPER)" C_INCLUDE_PATH="$(CURDIR)/sources/whisper.cpp/include:$(CURDIR)/sources/whisper.cpp/ggml/include" LIBRARY_PATH=$(CURDIR)/sources/whisper.cpp \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/whisper ./backend/go/transcribe/whisper
diff --git a/backend/go/image/tinydream/main.go b/backend/go/image/tinydream/main.go
deleted file mode 100644
index ae259fa7..00000000
--- a/backend/go/image/tinydream/main.go
+++ /dev/null
@@ -1,21 +0,0 @@
-package main
-
-// Note: this is started internally by LocalAI and a server is allocated for each model
-
-import (
- "flag"
-
- grpc "github.com/mudler/LocalAI/pkg/grpc"
-)
-
-var (
- addr = flag.String("addr", "localhost:50051", "the address to connect to")
-)
-
-func main() {
- flag.Parse()
-
- if err := grpc.StartServer(*addr, &Image{}); err != nil {
- panic(err)
- }
-}
diff --git a/backend/go/image/tinydream/tinydream.go b/backend/go/image/tinydream/tinydream.go
deleted file mode 100644
index ad364c47..00000000
--- a/backend/go/image/tinydream/tinydream.go
+++ /dev/null
@@ -1,32 +0,0 @@
-package main
-
-// This is a wrapper to statisfy the GRPC service interface
-// It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
-import (
- "github.com/mudler/LocalAI/pkg/grpc/base"
- pb "github.com/mudler/LocalAI/pkg/grpc/proto"
- "github.com/mudler/LocalAI/pkg/tinydream"
-)
-
-type Image struct {
- base.SingleThread
- tinydream *tinydream.TinyDream
-}
-
-func (image *Image) Load(opts *pb.ModelOptions) error {
- var err error
- // Note: the Model here is a path to a directory containing the model files
- image.tinydream, err = tinydream.New(opts.ModelFile)
- return err
-}
-
-func (image *Image) GenerateImage(opts *pb.GenerateImageRequest) error {
- return image.tinydream.GenerateImage(
- int(opts.Height),
- int(opts.Width),
- int(opts.Step),
- int(opts.Seed),
- opts.PositivePrompt,
- opts.NegativePrompt,
- opts.Dst)
-}
diff --git a/core/config/backend_config.go b/core/config/backend_config.go
index bb2fa643..a488f2a0 100644
--- a/core/config/backend_config.go
+++ b/core/config/backend_config.go
@@ -515,7 +515,7 @@ func (c *BackendConfig) GuessUsecases(u BackendConfigUsecases) bool {
}
}
if (u & FLAG_IMAGE) == FLAG_IMAGE {
- imageBackends := []string{"diffusers", "tinydream", "stablediffusion"}
+ imageBackends := []string{"diffusers", "stablediffusion"}
if !slices.Contains(imageBackends, c.Backend) {
return false
}
diff --git a/core/http/endpoints/openai/image.go b/core/http/endpoints/openai/image.go
index 3fdb64d4..baaecd4e 100644
--- a/core/http/endpoints/openai/image.go
+++ b/core/http/endpoints/openai/image.go
@@ -130,8 +130,6 @@ func ImageEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appCon
switch config.Backend {
case "stablediffusion":
config.Backend = model.StableDiffusionBackend
- case "tinydream":
- config.Backend = model.TinyDreamBackend
case "":
config.Backend = model.StableDiffusionBackend
}
diff --git a/docs/content/docs/getting-started/build.md b/docs/content/docs/getting-started/build.md
index f21a5b48..9fff1989 100644
--- a/docs/content/docs/getting-started/build.md
+++ b/docs/content/docs/getting-started/build.md
@@ -88,7 +88,7 @@ Here is the list of the variables available that can be used to customize the bu
| Variable | Default | Description |
| ---------------------| ------- | ----------- |
| `BUILD_TYPE` | None | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas`, `sycl_f16`, `sycl_f32` |
-| `GO_TAGS` | `tts stablediffusion` | Go tags. Available: `stablediffusion`, `tts`, `tinydream` |
+| `GO_TAGS` | `tts stablediffusion` | Go tags. Available: `stablediffusion`, `tts` |
| `CLBLAST_DIR` | | Specify a CLBlast directory |
| `CUDA_LIBPATH` | | Specify a CUDA library path |
| `BUILD_API_ONLY` | false | Set to true to build only the API (no backends will be built) |
@@ -202,7 +202,7 @@ make build
**Requirements**: OpenCV, Gomp
-Image generation requires `GO_TAGS=stablediffusion` or `GO_TAGS=tinydream` to be set during build:
+Image generation requires `GO_TAGS=stablediffusion` to be set during build:
```
make GO_TAGS=stablediffusion build
diff --git a/docs/content/docs/getting-started/container-images.md b/docs/content/docs/getting-started/container-images.md
index 967fc28b..64f6dbc9 100644
--- a/docs/content/docs/getting-started/container-images.md
+++ b/docs/content/docs/getting-started/container-images.md
@@ -16,7 +16,7 @@ For GPU Acceleration support for Nvidia video graphic cards, use the Nvidia/CUDA
**Available Images Types**:
-- Images ending with `-core` are smaller images without predownload python dependencies. Use these images if you plan to use `llama.cpp`, `stablediffusion-ncn`, `tinydream` or `rwkv` backends - if you are not sure which one to use, do **not** use these images.
+- Images ending with `-core` are smaller images without predownload python dependencies. Use these images if you plan to use `llama.cpp`, `stablediffusion-ncn` or `rwkv` backends - if you are not sure which one to use, do **not** use these images.
- Images containing the `aio` tag are all-in-one images with all the features enabled, and come with an opinionated set of configuration.
- FFMpeg is **not** included in the default images due to [its licensing](https://www.ffmpeg.org/legal.html). If you need FFMpeg, use the images ending with `-ffmpeg`. Note that `ffmpeg` is needed in case of using `audio-to-text` LocalAI's features.
- If using old and outdated CPUs and no GPUs you might need to set `REBUILD` to `true` as environment variable along with options to disable the flags which your CPU does not support, however note that inference will perform poorly and slow. See also [flagset compatibility]({{%relref "docs/getting-started/build#cpu-flagset-compatibility" %}}).
diff --git a/docs/content/docs/reference/compatibility-table.md b/docs/content/docs/reference/compatibility-table.md
index 7056f4a5..d2f4d8ac 100644
--- a/docs/content/docs/reference/compatibility-table.md
+++ b/docs/content/docs/reference/compatibility-table.md
@@ -32,7 +32,7 @@ LocalAI will attempt to automatically load models which are not explicitly confi
| `mamba` | Mamba models architecture | yes | GPT | no | no | CPU/CUDA |
| `exllama2` | GPTQ | yes | GPT only | no | no | N/A |
| `transformers-musicgen` | | no | Audio generation | no | no | N/A |
-| [tinydream](https://github.com/symisc/tiny-dream#tiny-dreaman-embedded-header-only-stable-diffusion-inference-c-librarypixlabiotiny-dream) | stablediffusion | no | Image | no | no | N/A |
+| stablediffusion | no | Image | no | no | N/A |
| `coqui` | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
| `openvoice` | Open voice | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
| `parler-tts` | Open voice | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 349cd419..35fac331 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -11187,15 +11187,6 @@
description: |
Stable Diffusion in NCNN with c++, supported txt2img and img2img
name: stablediffusion-cpp
-## Tiny Dream
-- url: github:mudler/LocalAI/gallery/tinydream.yaml@master
- name: tinydream
- license: "BSD-3"
- urls:
- - https://github.com/symisc/tiny-dream
- - https://github.com/symisc/tiny-dream/blob/main/LICENSE
- description: |
- An embedded, Header Only, Stable Diffusion C++ implementation
- &piper
## Piper TTS
url: github:mudler/LocalAI/gallery/piper.yaml@master
diff --git a/gallery/tinydream.yaml b/gallery/tinydream.yaml
deleted file mode 100644
index e4a79ad7..00000000
--- a/gallery/tinydream.yaml
+++ /dev/null
@@ -1,37 +0,0 @@
----
-name: "tinydream"
-
-config_file: |
- name: tinydream
- backend: tinydream
- parameters:
- model: tinydream_assets
-
-files:
- - filename: "tinydream_assets/AutoencoderKL-fp16.bin"
- sha256: "f02e71f80e70252734724bbfaed5c4ddd3a8ed7e61bb2175ff5f53099f0e35dd"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/AutoencoderKL-fp16.bin"
- - filename: "tinydream_assets/AutoencoderKL-fp16.param"
- sha256: "0254a056dce61b0c27dc9ec1b78b53bcf55315c540f55f051eb841aa992701ba"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/AutoencoderKL-fp16.param"
- - filename: "tinydream_assets/FrozenCLIPEmbedder-fp16.bin"
- sha256: "1c9a12f4e1dd1b295a388045f7f28a2352a4d70c3dc96a542189a3dd7051fdd6"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/FrozenCLIPEmbedder-fp16.bin"
- - filename: "tinydream_assets/FrozenCLIPEmbedder-fp16.param"
- sha256: "471afbe678dd1fd3fe764ef9c6eccaccb0a7d7e601f27b462aa926b20eb368c9"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/FrozenCLIPEmbedder-fp16.param"
- - filename: "tinydream_assets/RealESRGAN_x4plus_anime.bin"
- sha256: "fe01c269cfd10cdef8e018ab66ebe750cf79c7af4d1f9c16c737e1295229bacc"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/RealESRGAN_x4plus_anime.bin"
- - filename: "tinydream_assets/RealESRGAN_x4plus_anime.param"
- sha256: "2b8fb6e0ae4d2d85704ca08c119a2f5ea40add4f2ecd512eb7f4cd44b6127ed4"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/RealESRGAN_x4plus_anime.param"
- - filename: "tinydream_assets/UNetModel-fp16.bin"
- sha256: "d618918d011bfc1f644c0f2a33bf84931bd53b28a98492b0a8ed6f3a818852c3"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/UNetModel-fp16.bin"
- - filename: "tinydream_assets/UNetModel-fp16.param"
- sha256: "696f6975de49f4325b53ce32aff81861a6d6c07cd9ce3f0aae2cc405350af38d"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/UNetModel-fp16.param"
- - filename: "tinydream_assets/vocab.txt"
- sha256: "e30e57b6f1e47616982ef898d8922be24e535b4fa3d0110477b3a6f02ebbae7d"
- uri: "https://github.com/M0Rf30/tiny-dream-bins/releases/download/1.0/vocab.txt"
diff --git a/go.mod b/go.mod
index 8aecf14d..adfa7357 100644
--- a/go.mod
+++ b/go.mod
@@ -6,7 +6,6 @@ toolchain go1.23.1
require (
dario.cat/mergo v1.0.1
- github.com/M0Rf30/go-tiny-dream v0.0.0-20240425104733-c04fa463ace9
github.com/Masterminds/sprig/v3 v3.3.0
github.com/alecthomas/kong v0.9.0
github.com/census-instrumentation/opencensus-proto v0.4.1
diff --git a/go.sum b/go.sum
index a1a487b2..4a744ed8 100644
--- a/go.sum
+++ b/go.sum
@@ -27,8 +27,6 @@ github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03
github.com/BurntSushi/toml v1.2.1/go.mod h1:CxXYINrC8qIiEnFrOxCa7Jy5BFHlXnUU2pbicEuybxQ=
github.com/KyleBanks/depth v1.2.1 h1:5h8fQADFrWtarTdtDudMmGsC7GPbOAu6RVB3ffsVFHc=
github.com/KyleBanks/depth v1.2.1/go.mod h1:jzSb9d0L43HxTQfT+oSA1EEp2q+ne2uh6XgeJcm8brE=
-github.com/M0Rf30/go-tiny-dream v0.0.0-20240425104733-c04fa463ace9 h1:ASsbvw7wQPldWpwKdmYRszJ2A8Cj3oJDr4zO0DiXvN4=
-github.com/M0Rf30/go-tiny-dream v0.0.0-20240425104733-c04fa463ace9/go.mod h1:UOf2Mb/deUri5agct5OJ4SLWjhI+kZKbsUVUeRb24I0=
github.com/Masterminds/goutils v1.1.1 h1:5nUrii3FMTL5diU80unEVvNevw1nH4+ZV4DSLVJLSYI=
github.com/Masterminds/goutils v1.1.1/go.mod h1:8cTjp+g8YejhMuvIA5y2vz3BpJxksy863GQaJW2MFNU=
github.com/Masterminds/semver/v3 v3.3.0 h1:B8LGeaivUe71a5qox1ICM/JLl0NqZSW5CHyL+hmvYS0=
diff --git a/pkg/model/initializers.go b/pkg/model/initializers.go
index eb3e4fdf..756deea7 100644
--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@@ -56,7 +56,6 @@ const (
WhisperBackend = "whisper"
StableDiffusionBackend = "stablediffusion"
- TinyDreamBackend = "tinydream"
PiperBackend = "piper"
LCHuggingFaceBackend = "huggingface"
diff --git a/pkg/tinydream/generate.go b/pkg/tinydream/generate.go
deleted file mode 100644
index cfcd23cc..00000000
--- a/pkg/tinydream/generate.go
+++ /dev/null
@@ -1,36 +0,0 @@
-//go:build tinydream
-// +build tinydream
-
-package tinydream
-
-import (
- "fmt"
- "path/filepath"
-
- tinyDream "github.com/M0Rf30/go-tiny-dream"
-)
-
-func GenerateImage(height, width, step, seed int, positive_prompt, negative_prompt, dst, asset_dir string) error {
- fmt.Println(dst)
- if height > 512 || width > 512 {
- return tinyDream.GenerateImage(
- 1,
- step,
- seed,
- positive_prompt,
- negative_prompt,
- filepath.Dir(dst),
- asset_dir,
- )
- }
-
- return tinyDream.GenerateImage(
- 0,
- step,
- seed,
- positive_prompt,
- negative_prompt,
- filepath.Dir(dst),
- asset_dir,
- )
-}
diff --git a/pkg/tinydream/generate_unsupported.go b/pkg/tinydream/generate_unsupported.go
deleted file mode 100644
index 4ffd421a..00000000
--- a/pkg/tinydream/generate_unsupported.go
+++ /dev/null
@@ -1,10 +0,0 @@
-//go:build !tinydream
-// +build !tinydream
-
-package tinydream
-
-import "fmt"
-
-func GenerateImage(height, width, step, seed int, positive_prompt, negative_prompt, dst, asset_dir string) error {
- return fmt.Errorf("This version of LocalAI was built without the tinytts tag")
-}
diff --git a/pkg/tinydream/tinydream.go b/pkg/tinydream/tinydream.go
deleted file mode 100644
index a316e641..00000000
--- a/pkg/tinydream/tinydream.go
+++ /dev/null
@@ -1,20 +0,0 @@
-package tinydream
-
-import "os"
-
-type TinyDream struct {
- assetDir string
-}
-
-func New(assetDir string) (*TinyDream, error) {
- if _, err := os.Stat(assetDir); err != nil {
- return nil, err
- }
- return &TinyDream{
- assetDir: assetDir,
- }, nil
-}
-
-func (td *TinyDream) GenerateImage(height, width, step, seed int, positive_prompt, negative_prompt, dst string) error {
- return GenerateImage(height, width, step, seed, positive_prompt, negative_prompt, dst, td.assetDir)
-}
From d0cc3047dc424a9731f8c74b37aa3e45a58ce14a Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Sat, 18 Jan 2025 18:36:05 +0100
Subject: [PATCH 052/679] chore(model gallery): add MiniCPM-V-2.6-8b-q4_K_M
(#4633)
Signed-off-by: Gianluca Boiano
---
gallery/index.yaml | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 35fac331..edd52725 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -9187,6 +9187,7 @@
uri: huggingface://xtuner/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-mmproj-f16.gguf
- !!merge <<: *llama3
name: "minicpm-llama3-v-2_5"
+ icon: https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png
urls:
- https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf
- https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5
@@ -9210,6 +9211,33 @@
- filename: minicpm-llama3-mmproj-f16.gguf
sha256: 391d11736c3cd24a90417c47b0c88975e86918fcddb1b00494c4d715b08af13e
uri: huggingface://openbmb/MiniCPM-Llama3-V-2_5-gguf/mmproj-model-f16.gguf
+- !!merge <<: *llama3
+ name: "minicpm-v-2_6"
+ license: apache-2.0
+ icon: https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png
+ urls:
+ - https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf
+ - https://huggingface.co/openbmb/MiniCPM-V-2_6
+ description: |
+ MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters
+ tags:
+ - llm
+ - multimodal
+ - gguf
+ - gpu
+ - llama3
+ - cpu
+ overrides:
+ mmproj: minicpm-v-2_6-mmproj-f16.gguf
+ parameters:
+ model: minicpm-v-2_6-Q4_K_M.gguf
+ files:
+ - filename: minicpm-v-2_6-Q4_K_M.gguf
+ sha256: 3a4078d53b46f22989adbf998ce5a3fd090b6541f112d7e936eb4204a04100b1
+ uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/ggml-model-Q4_K_M.gguf
+ - filename: minicpm-v-2_6-mmproj-f16.gguf
+ sha256: f8a805e9e62085805c69c427287acefc284932eb4abfe6e1b1ce431d27e2f4e0
+ uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
- !!merge <<: *llama3
name: "llama-3-cursedstock-v1.8-8b-iq-imatrix"
urls:
From 296b97925fab0246184ac582621045565ce9a075 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 18 Jan 2025 23:21:27 +0100
Subject: [PATCH 053/679] chore: :arrow_up: Update leejet/stable-diffusion.cpp
to `5eb15ef4d022bef4a391de4f5f6556e81fbb5024` (#4636)
:arrow_up: Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 944cad37..fc4eddf4 100644
--- a/Makefile
+++ b/Makefile
@@ -28,7 +28,7 @@ BARKCPP_VERSION?=v1.0.0
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=dcf91f9e0f2cbf9da472ee2a556751ed4bab2d2a
+STABLEDIFFUSION_GGML_VERSION?=5eb15ef4d022bef4a391de4f5f6556e81fbb5024
ONNX_VERSION?=1.20.0
ONNX_ARCH?=x64
From a752183fb58de465daa35688c93fbe7d4ed324e9 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sun, 19 Jan 2025 08:38:33 +0100
Subject: [PATCH 054/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`a1649cc13f89946322358f92ea268ae1b7b5096c` (#4635)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index fc4eddf4..dfa91a15 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6
+CPPLLAMA_VERSION?=a1649cc13f89946322358f92ea268ae1b7b5096c
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From f496d0113b722847aaf4775394ccfd814255fef9 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 19 Jan 2025 09:07:56 +0100
Subject: [PATCH 055/679] chore(deps): pin numba
Signed-off-by: Ettore Di Giacinto
---
backend/python/transformers/requirements-cpu.txt | 3 ++-
backend/python/transformers/requirements-cublas11.txt | 1 +
backend/python/transformers/requirements-cublas12.txt | 1 +
backend/python/transformers/requirements-hipblas.txt | 1 +
backend/python/transformers/requirements-intel.txt | 1 +
backend/python/transformers/requirements.txt | 3 +--
6 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/backend/python/transformers/requirements-cpu.txt b/backend/python/transformers/requirements-cpu.txt
index 421c4b80..c88508e3 100644
--- a/backend/python/transformers/requirements-cpu.txt
+++ b/backend/python/transformers/requirements-cpu.txt
@@ -1,7 +1,8 @@
torch==2.4.1
llvmlite==0.43.0
+numba==0.60.0
accelerate
transformers
bitsandbytes
outetts
-sentence-transformers==3.3.1
+sentence-transformers==3.3.1
\ No newline at end of file
diff --git a/backend/python/transformers/requirements-cublas11.txt b/backend/python/transformers/requirements-cublas11.txt
index c5d18d09..0faa9cec 100644
--- a/backend/python/transformers/requirements-cublas11.txt
+++ b/backend/python/transformers/requirements-cublas11.txt
@@ -1,6 +1,7 @@
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1+cu118
llvmlite==0.43.0
+numba==0.60.0
accelerate
transformers
bitsandbytes
diff --git a/backend/python/transformers/requirements-cublas12.txt b/backend/python/transformers/requirements-cublas12.txt
index c0bcfc87..1e22312f 100644
--- a/backend/python/transformers/requirements-cublas12.txt
+++ b/backend/python/transformers/requirements-cublas12.txt
@@ -1,6 +1,7 @@
torch==2.4.1
accelerate
llvmlite==0.43.0
+numba==0.60.0
transformers
bitsandbytes
outetts
diff --git a/backend/python/transformers/requirements-hipblas.txt b/backend/python/transformers/requirements-hipblas.txt
index e7f53860..47aa88db 100644
--- a/backend/python/transformers/requirements-hipblas.txt
+++ b/backend/python/transformers/requirements-hipblas.txt
@@ -3,6 +3,7 @@ torch==2.4.1+rocm6.0
accelerate
transformers
llvmlite==0.43.0
+numba==0.60.0
bitsandbytes
outetts
bitsandbytes
diff --git a/backend/python/transformers/requirements-intel.txt b/backend/python/transformers/requirements-intel.txt
index aada6e00..708b0516 100644
--- a/backend/python/transformers/requirements-intel.txt
+++ b/backend/python/transformers/requirements-intel.txt
@@ -4,6 +4,7 @@ torch==2.3.1+cxx11.abi
oneccl_bind_pt==2.3.100+xpu
optimum[openvino]
llvmlite==0.43.0
+numba==0.60.0
intel-extension-for-transformers
bitsandbytes
outetts
diff --git a/backend/python/transformers/requirements.txt b/backend/python/transformers/requirements.txt
index d353e4d0..db41b928 100644
--- a/backend/python/transformers/requirements.txt
+++ b/backend/python/transformers/requirements.txt
@@ -3,5 +3,4 @@ protobuf
certifi
setuptools
scipy==1.15.1
-numpy>=2.0.0
-numba==0.60.0
\ No newline at end of file
+numpy>=2.0.0
\ No newline at end of file
From 83e2dd5dff7b36d8cc9528d63ed0468145ef79df Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sun, 19 Jan 2025 23:34:32 +0100
Subject: [PATCH 056/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`92bc493917d43b83e592349e138b54c90b1c3ea7` (#4640)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index dfa91a15..7aaad492 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=a1649cc13f89946322358f92ea268ae1b7b5096c
+CPPLLAMA_VERSION?=92bc493917d43b83e592349e138b54c90b1c3ea7
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 30739d94a41139fe5c8cf68239cc7353d102c4fe Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Mon, 20 Jan 2025 10:34:19 +0100
Subject: [PATCH 057/679] chore(model gallery): add InternLM3-8b-Q4_K_M
(#4637)
chore(model gallery): add InternLM3-8b-Q4_K_M
Signed-off-by: Gianluca Boiano
---
gallery/index.yaml | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index edd52725..61ecd107 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -10100,7 +10100,7 @@
urls:
- https://huggingface.co/internlm/internlm2_5-7b-chat-1m
- https://huggingface.co/bartowski/internlm2_5-7b-chat-1m-GGUF
- icon: https://github.com/InternLM/InternLM/assets/22529082/b9788105-8892-4398-8b47-b513a292378e
+ icon: https://avatars.githubusercontent.com/u/135356492
tags:
- internlm2
- gguf
@@ -10121,6 +10121,31 @@
- filename: internlm2_5-7b-chat-1m-Q4_K_M.gguf
uri: huggingface://bartowski/internlm2_5-7b-chat-1m-GGUF/internlm2_5-7b-chat-1m-Q4_K_M.gguf
sha256: 10d5e18a4125f9d4d74a9284a21e0c820b150af06dee48665e54ff6e1be3a564
+### Internlm3
+- name: "internlm3-8b-instruct"
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+ urls:
+ - https://huggingface.co/internlm/internlm3-8b-instruct
+ - https://huggingface.co/bartowski/internlm3-8b-instruct-GGUF
+ icon: https://avatars.githubusercontent.com/u/135356492
+ tags:
+ - internlm3
+ - gguf
+ - cpu
+ - gpu
+ description: |
+ InternLM3 has open-sourced an 8-billion parameter instruction model, InternLM3-8B-Instruct, designed for general-purpose usage and advanced reasoning. The model has the following characteristics:
+
+ Enhanced performance at reduced cost: State-of-the-art performance on reasoning and knowledge-intensive tasks surpass models like Llama3.1-8B and Qwen2.5-7B.
+
+ Deep thinking capability: InternLM3 supports both the deep thinking mode for solving complicated reasoning tasks via the long chain-of-thought and the normal response mode for fluent user interactions.
+ overrides:
+ parameters:
+ model: internlm3-8b-instruct-Q4_K_M.gguf
+ files:
+ - filename: internlm3-8b-instruct-Q4_K_M.gguf
+ uri: huggingface://bartowski/internlm3-8b-instruct-GGUF/internlm3-8b-instruct-Q4_K_M.gguf
+ sha256: 2a9644687318e8659c9cf9b40730d5cc2f5af06f786a50439c7c51359b23896e
- &phi-3
### START Phi-3
url: "github:mudler/LocalAI/gallery/phi-3-chat.yaml@master"
From 390bb3f58bb5d878c852c71e473ae0754a8d817d Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Mon, 20 Jan 2025 10:35:05 +0100
Subject: [PATCH 058/679] fix(model gallery): minicpm-v-2.6 is based on qwen2
(#4638)
Signed-off-by: Gianluca Boiano
---
gallery/index.yaml | 54 +++++++++++++++++++++++-----------------------
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 61ecd107..1c170f99 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5583,6 +5583,33 @@
- filename: marco-o1-uncensored.Q4_K_M.gguf
sha256: ad0440270a7254098f90779744d3e5b34fe49b7baf97c819909ba9c5648cc0d9
uri: huggingface://QuantFactory/marco-o1-uncensored-GGUF/marco-o1-uncensored.Q4_K_M.gguf
+- !!merge <<: *qwen2
+ name: "minicpm-v-2_6"
+ license: apache-2.0
+ icon: https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png
+ urls:
+ - https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf
+ - https://huggingface.co/openbmb/MiniCPM-V-2_6
+ description: |
+ MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters
+ tags:
+ - llm
+ - multimodal
+ - gguf
+ - gpu
+ - qwen2
+ - cpu
+ overrides:
+ mmproj: minicpm-v-2_6-mmproj-f16.gguf
+ parameters:
+ model: minicpm-v-2_6-Q4_K_M.gguf
+ files:
+ - filename: minicpm-v-2_6-Q4_K_M.gguf
+ sha256: 3a4078d53b46f22989adbf998ce5a3fd090b6541f112d7e936eb4204a04100b1
+ uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/ggml-model-Q4_K_M.gguf
+ - filename: minicpm-v-2_6-mmproj-f16.gguf
+ sha256: f8a805e9e62085805c69c427287acefc284932eb4abfe6e1b1ce431d27e2f4e0
+ uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
- &mistral03
## START Mistral
url: "github:mudler/LocalAI/gallery/mistral-0.3.yaml@master"
@@ -9211,33 +9238,6 @@
- filename: minicpm-llama3-mmproj-f16.gguf
sha256: 391d11736c3cd24a90417c47b0c88975e86918fcddb1b00494c4d715b08af13e
uri: huggingface://openbmb/MiniCPM-Llama3-V-2_5-gguf/mmproj-model-f16.gguf
-- !!merge <<: *llama3
- name: "minicpm-v-2_6"
- license: apache-2.0
- icon: https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png
- urls:
- - https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf
- - https://huggingface.co/openbmb/MiniCPM-V-2_6
- description: |
- MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters
- tags:
- - llm
- - multimodal
- - gguf
- - gpu
- - llama3
- - cpu
- overrides:
- mmproj: minicpm-v-2_6-mmproj-f16.gguf
- parameters:
- model: minicpm-v-2_6-Q4_K_M.gguf
- files:
- - filename: minicpm-v-2_6-Q4_K_M.gguf
- sha256: 3a4078d53b46f22989adbf998ce5a3fd090b6541f112d7e936eb4204a04100b1
- uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/ggml-model-Q4_K_M.gguf
- - filename: minicpm-v-2_6-mmproj-f16.gguf
- sha256: f8a805e9e62085805c69c427287acefc284932eb4abfe6e1b1ce431d27e2f4e0
- uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
- !!merge <<: *llama3
name: "llama-3-cursedstock-v1.8-8b-iq-imatrix"
urls:
From 0c0e015b3893816a984f59cd5a6cfb25f5cf90c1 Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Mon, 20 Jan 2025 10:40:46 +0100
Subject: [PATCH 059/679] chore(model gallery): update icons and add missing
ones (#4639)
* chore(model gallery): uniform github URLs for icons
Signed-off-by: Gianluca Boiano
* chore(model gallery): add icons to phi models
Signed-off-by: Gianluca Boiano
* chore(model gallery): add icons to QwenLM models
Signed-off-by: Gianluca Boiano
* chore(model gallery): update icon for Arcee org
Signed-off-by: Gianluca Boiano
* chore(model gallery): update icon for Meta org
Signed-off-by: Gianluca Boiano
* chore(model gallery): update icon url for OpenCoder org
Signed-off-by: Gianluca Boiano
* chore(model gallery): add icon for RWKV org
Signed-off-by: Gianluca Boiano
* chore(model gallery): add icon for IBM-granite org
Signed-off-by: Gianluca Boiano
* chore(model gallery): add icon for OpenBMB org
Signed-off-by: Gianluca Boiano
* chore(model gallery): add icon for KatanemoLabs org
Signed-off-by: Gianluca Boiano
* chore(model gallery): update icon for Meta-Llama-3.1-8B-Instruct-abliterated
Signed-off-by: Gianluca Boiano
* chore(model gallery): update icon for hermes-3-llama-3.1-8b-lorablated
Signed-off-by: Gianluca Boiano
* chore(model gallery): add icon for Google org
Signed-off-by: Gianluca Boiano
---------
Signed-off-by: Gianluca Boiano
Signed-off-by: Ettore Di Giacinto
Co-authored-by: Ettore Di Giacinto
---
gallery/index.yaml | 53 +++++++++++++++++++++++++++-------------------
1 file changed, 31 insertions(+), 22 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 1c170f99..fb5476f9 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -2,6 +2,7 @@
- &phi4
url: "github:mudler/LocalAI/gallery/phi-4-chat.yaml@master"
name: "phi-4"
+ icon: https://avatars.githubusercontent.com/u/6154722
license: mit
tags:
- llm
@@ -224,7 +225,7 @@
uri: huggingface://bartowski/INTELLECT-1-Instruct-GGUF/INTELLECT-1-Instruct-Q4_K_M.gguf
- &llama33
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
- icon: https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/aJJxKus1wP5N-euvHEUq7.png
+ icon: https://avatars.githubusercontent.com/u/153379578
license: llama3.3
description: |
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
@@ -421,6 +422,7 @@
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
+ icon: https://avatars.githubusercontent.com/u/132652788
license: apache-2.0
urls:
- https://huggingface.co/RWKV/rwkv-6-world-7b
@@ -443,6 +445,7 @@
uri: huggingface://bartowski/rwkv-6-world-7b-GGUF/rwkv-6-world-7b-Q4_K_M.gguf
- &qwen25coder
name: "qwen2.5-coder-14b"
+ icon: https://avatars.githubusercontent.com/u/141221163
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
license: apache-2.0
tags:
@@ -628,7 +631,7 @@
uri: huggingface://mraWdermacher/Qwen2.5-Coder-32B-Instruct-Uncensored-i1-GGUF/Qwen2.5-Coder-32B-Instruct-Uncensored.i1-Q4_K_M.gguf
- &opencoder
name: "opencoder-8b-base"
- icon: https://github.com/OpenCoder-llm/opencoder-llm.github.io/blob/main/static/images/opencoder_icon.jpg?raw=true
+ icon: https://avatars.githubusercontent.com/u/186387526
url: "github:mudler/LocalAI/gallery/codellama.yaml@master"
urls:
- https://huggingface.co/infly/OpenCoder-8B-Base
@@ -694,6 +697,7 @@
uri: huggingface://QuantFactory/OpenCoder-1.5B-Instruct-GGUF/OpenCoder-1.5B-Instruct.Q4_K_M.gguf
- &granite3
name: "granite-3.0-1b-a400m-instruct"
+ icon: https://avatars.githubusercontent.com/u/167822367
urls:
- https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-instruct
- https://huggingface.co/QuantFactory/granite-3.0-1b-a400m-instruct-GGUF
@@ -781,7 +785,7 @@
- &llama32
## llama3.2
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
- icon: https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/aJJxKus1wP5N-euvHEUq7.png
+ icon: https://avatars.githubusercontent.com/u/153379578
license: llama3.2
description: |
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.
@@ -950,7 +954,6 @@
uri: huggingface://mradermacher/Llama-3.2-3B-Reasoning-Time-GGUF/Llama-3.2-3B-Reasoning-Time.Q4_K_M.gguf
- !!merge <<: *llama32
name: "llama-3.2-sun-2.5b-chat"
- icon: https://i.ibb.co/PF0TdMJ/imagine-image-9a56cee7-0f4f-4cc2-b265-a5b8d04f266b.png
urls:
- https://huggingface.co/meditsolutions/Llama-3.2-SUN-2.5B-chat
- https://huggingface.co/mradermacher/Llama-3.2-SUN-2.5B-chat-GGUF
@@ -982,7 +985,6 @@
uri: huggingface://mradermacher/Llama-3.2-SUN-2.5B-chat-GGUF/Llama-3.2-SUN-2.5B-chat.Q4_K_M.gguf
- !!merge <<: *llama32
name: "llama-3.2-3b-instruct-uncensored"
- icon: https://i.imgur.com/JOePyAN.png
urls:
- https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-uncensored-GGUF
- https://huggingface.co/chuanli11/Llama-3.2-3B-Instruct-uncensored
@@ -1319,6 +1321,7 @@
- &qwen25
## Qwen2.5
name: "qwen2.5-14b-instruct"
+ icon: https://avatars.githubusercontent.com/u/141221163
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
license: apache-2.0
description: |
@@ -1608,6 +1611,7 @@
uri: huggingface://bartowski/qwen2.5-7b-ins-v3-GGUF/qwen2.5-7b-ins-v3-Q4_K_M.gguf
- !!merge <<: *qwen25
name: "supernova-medius"
+ icon: https://avatars.githubusercontent.com/u/126496414
urls:
- https://huggingface.co/arcee-ai/SuperNova-Medius-GGUF
description: |
@@ -1762,7 +1766,7 @@
uri: huggingface://bartowski/TheBeagle-v2beta-32B-MGS-GGUF/TheBeagle-v2beta-32B-MGS-Q4_K_M.gguf
- !!merge <<: *qwen25
name: "meraj-mini"
- icon: https://i.ibb.co/CmPSSpq/Screenshot-2024-10-06-at-9-45-06-PM.png
+ icon: https://avatars.githubusercontent.com/u/126496414
urls:
- https://huggingface.co/arcee-ai/Meraj-Mini
- https://huggingface.co/QuantFactory/Meraj-Mini-GGUF
@@ -2392,7 +2396,7 @@
uri: huggingface://QuantFactory/Math-IIO-7B-Instruct-GGUF/Math-IIO-7B-Instruct.Q4_K_M.gguf
- !!merge <<: *qwen25
name: "virtuoso-small"
- icon: https://i.ibb.co/pXD6Bcv/SW2-U-g-QQLSH1-ZAbxhs-Iu-A.webp
+ icon: https://avatars.githubusercontent.com/u/126496414
urls:
- https://huggingface.co/arcee-ai/Virtuoso-Small-GGUF
description: |
@@ -2670,6 +2674,7 @@
- cpu
- function-calling
name: "arch-function-1.5b"
+ icon: https://avatars.githubusercontent.com/u/112724757
uri: "github:mudler/LocalAI/gallery/arch-function.yaml@master"
urls:
- https://huggingface.co/katanemolabs/Arch-Function-1.5B
@@ -3109,7 +3114,7 @@
uri: huggingface://bartowski/Rombos-Qwen2.5-Writer-32b-GGUF/Rombos-Qwen2.5-Writer-32b-Q4_K_M.gguf
- !!merge <<: *qwen25
name: "sky-t1-32b-preview"
- icon: https://raw.githubusercontent.com/NovaSky-AI/novasky-ai.github.io/main/assets/images/blue-bird-wider.jpeg
+ icon: https://github.com/NovaSky-AI/novasky-ai.github.io/raw/main/assets/images/blue-bird-wider.jpeg
urls:
- https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview
- https://huggingface.co/bartowski/Sky-T1-32B-Preview-GGUF
@@ -3298,7 +3303,7 @@
- &llama31
## LLama3.1
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
- icon: https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/aJJxKus1wP5N-euvHEUq7.png
+ icon: https://avatars.githubusercontent.com/u/153379578
name: "meta-llama-3.1-8b-instruct"
license: llama3.1
description: |
@@ -3387,7 +3392,7 @@
sha256: 6d175432f66d10dfed9737f73a5073d513d18e1ee7bd4b9cf2a59deb359f36ff
- !!merge <<: *llama31
name: "meta-llama-3.1-8b-instruct-abliterated"
- icon: https://i.imgur.com/KhorYYG.png
+ icon: https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/AsTgL8VCgMHgobq4cr46b.png
urls:
- https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
- https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF
@@ -3416,7 +3421,7 @@
uri: huggingface://mmnga/Llama-3.1-70B-Japanese-Instruct-2407-gguf/Llama-3.1-70B-Japanese-Instruct-2407-Q4_K_M.gguf
- !!merge <<: *llama31
name: "openbuddy-llama3.1-8b-v22.1-131k"
- icon: https://raw.githubusercontent.com/OpenBuddy/OpenBuddy/main/media/demo.png
+ icon: https://github.com/OpenBuddy/OpenBuddy/raw/main/media/demo.png
urls:
- https://huggingface.co/sunnyyy/openbuddy-llama3.1-8b-v22.1-131k-Q4_K_M-GGUF
description: |
@@ -3592,7 +3597,7 @@
sha256: 6557c5d5091f2507d19ab1f8bfb9ceb4e1536a755ab70f148b18aeb33741580f
uri: huggingface://mradermacher/Llama-3.1-Techne-RP-8b-v1-GGUF/Llama-3.1-Techne-RP-8b-v1.Q4_K_M.gguf
- !!merge <<: *llama31
- icon: https://i.ibb.co/9hwFrvL/BLMs-Wkx-NQf-W-46-FZDg-ILhg.jpg
+ icon: https://avatars.githubusercontent.com/u/126496414
name: "llama-spark"
urls:
- https://huggingface.co/arcee-ai/Llama-Spark
@@ -3710,7 +3715,6 @@
- !!merge <<: *llama31
name: "llama-3.1-supernova-lite-reflection-v1.0-i1"
url: "github:mudler/LocalAI/gallery/llama3.1-reflective.yaml@master"
- icon: https://i.ibb.co/r072p7j/eopi-ZVu-SQ0-G-Cav78-Byq-Tg.png
urls:
- https://huggingface.co/SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0
- https://huggingface.co/mradermacher/Llama-3.1-SuperNova-Lite-Reflection-V1.0-i1-GGUF
@@ -3725,7 +3729,7 @@
uri: huggingface://mradermacher/Llama-3.1-SuperNova-Lite-Reflection-V1.0-i1-GGUF/Llama-3.1-SuperNova-Lite-Reflection-V1.0.i1-Q4_K_M.gguf
- !!merge <<: *llama31
name: "llama-3.1-supernova-lite"
- icon: https://i.ibb.co/r072p7j/eopi-ZVu-SQ0-G-Cav78-Byq-Tg.png
+ icon: https://avatars.githubusercontent.com/u/126496414
urls:
- https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite
- https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite-GGUF
@@ -4239,6 +4243,7 @@
uri: huggingface://mradermacher/Hermes-3-Llama-3.1-70B-lorablated-GGUF/Hermes-3-Llama-3.1-70B-lorablated.Q4_K_M.gguf
- !!merge <<: *llama31
name: "hermes-3-llama-3.1-8b-lorablated"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/4Hbw5n68jKUSBQeTqQIeT.png
urls:
- https://huggingface.co/mlabonne/Hermes-3-Llama-3.1-8B-lorablated-GGUF
description: |
@@ -5254,6 +5259,7 @@
## Start QWEN2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "qwen2-7b-instruct"
+ icon: https://avatars.githubusercontent.com/u/141221163
license: apache-2.0
description: |
Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 7B Qwen2 model.
@@ -5360,7 +5366,7 @@
uri: huggingface://bartowski/Einstein-v7-Qwen2-7B-GGUF/Einstein-v7-Qwen2-7B-Q4_K_M.gguf
- !!merge <<: *qwen2
name: "arcee-spark"
- icon: https://i.ibb.co/80ssNWS/o-Vdk-Qx-ARNmzr-Pi1h-Efj-SA.webp
+ icon: https://avatars.githubusercontent.com/u/126496414
description: |
Arcee Spark is a powerful 7B parameter language model that punches well above its weight class. Initialized from Qwen2, this model underwent a sophisticated training process:
@@ -5398,7 +5404,7 @@
uri: huggingface://Hercules-5.0-Qwen2-7B-Q4_K_M.gguf/Hercules-5.0-Qwen2-7B-Q4_K_M.gguf
- !!merge <<: *qwen2
name: "arcee-agent"
- icon: https://i.ibb.co/CBHmTDn/136719a5-6d8a-4654-a618-46eabc788953.jpg
+ icon: https://avatars.githubusercontent.com/u/126496414
description: |
Arcee Agent is a cutting-edge 7B parameter language model specifically designed for function calling and tool use. Initialized from Qwen2-7B, it rivals the performance of much larger models while maintaining efficiency and speed. This model is particularly suited for developers, researchers, and businesses looking to implement sophisticated AI-driven solutions without the computational overhead of larger language models. Compute for training Arcee-Agent was provided by CrusoeAI. Arcee-Agent was trained using Spectrum.
urls:
@@ -5586,7 +5592,7 @@
- !!merge <<: *qwen2
name: "minicpm-v-2_6"
license: apache-2.0
- icon: https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png
+ icon: https://avatars.githubusercontent.com/u/89920203
urls:
- https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf
- https://huggingface.co/openbmb/MiniCPM-V-2_6
@@ -6321,6 +6327,7 @@
- &gemma
url: "github:mudler/LocalAI/gallery/gemma.yaml@master"
name: "gemma-2b"
+ icon: https://avatars.githubusercontent.com/u/1342004
license: gemma
urls:
- https://ai.google.dev/gemma/docs
@@ -7036,7 +7043,7 @@
uri: huggingface://bartowski/GWQ-9B-Preview2-GGUF/GWQ-9B-Preview2-Q4_K_M.gguf
- &llama3
url: "github:mudler/LocalAI/gallery/llama3-instruct.yaml@master"
- icon: https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/aJJxKus1wP5N-euvHEUq7.png
+ icon: https://avatars.githubusercontent.com/u/153379578
name: "llama3-8b-instruct"
license: llama3
description: |
@@ -8503,7 +8510,7 @@
urls:
- https://huggingface.co/arcee-ai/Llama-3-SEC-Chat-GGUF
- https://huggingface.co/arcee-ai/Llama-3-SEC-Chat
- icon: https://i.ibb.co/kHtBmDN/w8m6-X4-HCQRa-IR86ar-Cm5gg.webp
+ icon: https://avatars.githubusercontent.com/u/126496414
tags:
- llama3
- gguf
@@ -8536,7 +8543,7 @@
- &yi-chat
### Start Yi
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
- icon: "https://raw.githubusercontent.com/01-ai/Yi/main/assets/img/Yi_logo_icon_light.svg"
+ icon: "https://github.com/01-ai/Yi/raw/main/assets/img/Yi_logo_icon_light.svg"
name: "yi-1.5-9b-chat"
license: apache-2.0
urls:
@@ -9165,7 +9172,7 @@
urls:
- https://huggingface.co/BAAI/Bunny-Llama-3-8B-V-gguf
description: |
- Bunny is a family of lightweight but powerful multimodal models. It offers multiple plug-and-play vision encoders, like EVA-CLIP, SigLIP and language backbones, including Llama-3-8B, Phi-1.5, StableLM-2, Qwen1.5, MiniCPM and Phi-2. To compensate for the decrease in model size, we construct more informative training data by curated selection from a broader data source.
+ Bunny is a family of lightweight but powerful multimodal models. It offers multiple plug-and-play vision encoders, like EVA-CLIP, SigLIP and language backbones, including Llama-3-8B, Phi-1.5, StableLM-2, Qwen1.5, and Phi-2. To compensate for the decrease in model size, we construct more informative training data by curated selection from a broader data source.
We provide Bunny-Llama-3-8B-V, which is built upon SigLIP and Llama-3-8B-Instruct. More details about this model can be found in GitHub.
icon: https://huggingface.co/BAAI/Bunny-Llama-3-8B-V-gguf/resolve/main/icon.png
@@ -9214,7 +9221,7 @@
uri: huggingface://xtuner/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-mmproj-f16.gguf
- !!merge <<: *llama3
name: "minicpm-llama3-v-2_5"
- icon: https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png
+ icon: https://avatars.githubusercontent.com/u/89920203
urls:
- https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf
- https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5
@@ -10054,6 +10061,7 @@
- llama2
- cpu
name: "phi-2-chat:Q8_0"
+ icon: https://avatars.githubusercontent.com/u/6154722
overrides:
parameters:
model: phi-2-layla-v1-chatml-Q8_0.gguf
@@ -10150,6 +10158,7 @@
### START Phi-3
url: "github:mudler/LocalAI/gallery/phi-3-chat.yaml@master"
name: "phi-3-mini-4k-instruct"
+ icon: https://avatars.githubusercontent.com/u/6154722
license: mit
description: |
The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) it can support. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
From adebd557ce8446edbe097b3eeb54c524e6638e78 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 20 Jan 2025 10:45:10 +0100
Subject: [PATCH 060/679] chore(model gallery): add wayfarer-12b (#4641)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 105 +++++++++++++++++++++++----------------------
1 file changed, 54 insertions(+), 51 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index fb5476f9..0397bd75 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -190,7 +190,7 @@
- https://huggingface.co/Nitral-AI/NightWing3-10B-v0.1
- https://huggingface.co/bartowski/NightWing3-10B-v0.1-GGUF
description: |
- Base model: (Falcon3-10B)
+ Base model: (Falcon3-10B)
overrides:
parameters:
model: NightWing3-10B-v0.1-Q4_K_M.gguf
@@ -782,8 +782,7 @@
- filename: salamandra-7b-instruct.Q4_K_M-f32.gguf
sha256: bac8e8c1d1d9d53cbdb148b8ff9ad378ddb392429207099e85b5aae3a43bff3d
uri: huggingface://cstr/salamandra-7b-instruct-GGUF/salamandra-7b-instruct.Q4_K_M-f32.gguf
-- &llama32
- ## llama3.2
+- &llama32 ## llama3.2
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
license: llama3.2
@@ -1318,8 +1317,7 @@
- filename: FineMath-Llama-3B-Q4_K_M.gguf
sha256: 16c73b5cf2a417a7e1608bcc9469f1461fc3e759ce04a3a337f48df977dc158c
uri: huggingface://bartowski/FineMath-Llama-3B-GGUF/FineMath-Llama-3B-Q4_K_M.gguf
-- &qwen25
- ## Qwen2.5
+- &qwen25 ## Qwen2.5
name: "qwen2.5-14b-instruct"
icon: https://avatars.githubusercontent.com/u/141221163
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
@@ -3241,8 +3239,7 @@
- filename: DRT-o1-14B-Q4_K_M.gguf
sha256: 9619ca984cf4ce8e4f69bcde831de17b2ce05dd89536e3130608877521e3d328
uri: huggingface://bartowski/DRT-o1-14B-GGUF/DRT-o1-14B-Q4_K_M.gguf
-- &smollm
- ## SmolLM
+- &smollm ## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "smollm-1.7b-instruct"
icon: https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png
@@ -3300,8 +3297,7 @@
- filename: Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
sha256: eaeac314e30b461413bc1cc819cdc0cd6a79265711fd0b8268702960a082c7bd
uri: huggingface://QuantFactory/Vikhr-Qwen-2.5-1.5B-Instruct-GGUF/Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
-- &llama31
- ## LLama3.1
+- &llama31 ## LLama3.1
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
name: "meta-llama-3.1-8b-instruct"
@@ -5189,8 +5185,7 @@
- filename: Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf
sha256: 268390e07edd407ad93ea21a868b7ae995b5950e01cad0db9e1802ae5049d405
uri: huggingface://bartowski/Dolphin3.0-Llama3.1-8B-GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf
-- &deepseek
- ## Deepseek
+- &deepseek ## Deepseek
url: "github:mudler/LocalAI/gallery/deepseek.yaml@master"
name: "deepseek-coder-v2-lite-instruct"
icon: "https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true"
@@ -5255,8 +5250,7 @@
- filename: archangel_sft_pythia2-8b.Q4_K_M.gguf
sha256: a47782c55ef2b39b19644213720a599d9849511a73c9ebb0c1de749383c0a0f8
uri: huggingface://RichardErkhov/ContextualAI_-_archangel_sft_pythia2-8b-gguf/archangel_sft_pythia2-8b.Q4_K_M.gguf
-- &qwen2
- ## Start QWEN2
+- &qwen2 ## Start QWEN2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "qwen2-7b-instruct"
icon: https://avatars.githubusercontent.com/u/141221163
@@ -5616,8 +5610,7 @@
- filename: minicpm-v-2_6-mmproj-f16.gguf
sha256: f8a805e9e62085805c69c427287acefc284932eb4abfe6e1b1ce431d27e2f4e0
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
-- &mistral03
- ## START Mistral
+- &mistral03 ## START Mistral
url: "github:mudler/LocalAI/gallery/mistral-0.3.yaml@master"
name: "mistral-7b-instruct-v0.3"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/62dac1c7a8ead43d20e3e17a/wrLf5yaGC6ng4XME70w6Z.png
@@ -6222,8 +6215,35 @@
- filename: Nera_Noctis-12B-Q4_K_M.gguf
sha256: 0662a9a847adde046e6255c15d5a677ebf09ab00841547c8963668d14baf00ff
uri: huggingface://bartowski/Nera_Noctis-12B-GGUF/Nera_Noctis-12B-Q4_K_M.gguf
-- &mudler
- ### START mudler's LocalAI specific-models
+- !!merge <<: *mistral03
+ name: "wayfarer-12b"
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+ icon: https://huggingface.co/LatitudeGames/Wayfarer-12B/resolve/main/wayfarer.jpg
+ urls:
+ - https://huggingface.co/LatitudeGames/Wayfarer-12B
+ - https://huggingface.co/bartowski/Wayfarer-12B-GGUF
+ description: |
+ Weāve heard over and over from AI Dungeon players that modern AI models are too nice, never letting them fail or die. While it may be good for a chatbot to be nice and helpful, great stories and games arenāt all rainbows and unicorns. They have conflict, tension, and even death. These create real stakes and consequences for characters and the journeys they go on.
+
+ Similarly, great games need opposition. You must be able to fail, die, and may even have to start over. This makes games more fun!
+
+ However, the vast majority of AI models, through alignment RLHF, have been trained away from darkness, violence, or conflict, preventing them from fulfilling this role. To give our players better options, we decided to train our own model to fix these issues.
+
+ Wayfarer is an adventure role-play model specifically trained to give players a challenging and dangerous experience. We thought they would like it, but since releasing it on AI Dungeon, players have reacted even more positively than we expected.
+
+ Because they loved it so much, weāve decided to open-source the model so anyone can experience unforgivingly brutal AI adventures! Anyone can download the model to run locally.
+
+ Or if you want to easily try this model for free, you can do so at https://aidungeon.com.
+
+ We plan to continue improving and open-sourcing similar models, so please share any and all feedback on how we can improve model behavior. Below we share more details on how Wayfarer was created.
+ overrides:
+ parameters:
+ model: Wayfarer-12B-Q4_K_M.gguf
+ files:
+ - filename: Wayfarer-12B-Q4_K_M.gguf
+ sha256: 6cd9f290c820c64854fcdcfd312b066447acc2f63abe2e2e71af9bc4f1946c08
+ uri: huggingface://bartowski/Wayfarer-12B-GGUF/Wayfarer-12B-Q4_K_M.gguf
+- &mudler ### START mudler's LocalAI specific-models
url: "github:mudler/LocalAI/gallery/mudler.yaml@master"
name: "LocalAI-llama3-8b-function-call-v0.2"
icon: "https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/us5JKi9z046p8K-cn_M0w.webp"
@@ -6268,8 +6288,7 @@
- filename: Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin
sha256: 579cbb229f9c11d0330759ff4733102d2491615a4c61289e26c09d1b3a583fec
uri: huggingface://mudler/Mirai-Nova-Llama3-LocalAI-8B-v0.1-GGUF/Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin
-- &parler-tts
- ### START parler-tts
+- &parler-tts ### START parler-tts
url: "github:mudler/LocalAI/gallery/parler-tts.yaml@master"
name: parler-tts-mini-v0.1
overrides:
@@ -6286,8 +6305,7 @@
- cpu
- text-to-speech
- python
-- &rerankers
- ### START rerankers
+- &rerankers ### START rerankers
url: "github:mudler/LocalAI/gallery/rerankers.yaml@master"
name: cross-encoder
parameters:
@@ -8540,8 +8558,7 @@
- filename: Copus-2x8B.i1-Q4_K_M.gguf
sha256: 685da1ba49e203e8f491105585143d76044286d4b4687bed37d325f6b55501e5
uri: huggingface://mradermacher/Copus-2x8B-i1-GGUF/Copus-2x8B.i1-Q4_K_M.gguf
-- &yi-chat
- ### Start Yi
+- &yi-chat ### Start Yi
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
icon: "https://github.com/01-ai/Yi/raw/main/assets/img/Yi_logo_icon_light.svg"
name: "yi-1.5-9b-chat"
@@ -8752,8 +8769,7 @@
- filename: Fimbulvetr-11B-v2-Q4_K_M-imat.gguf
sha256: 3f309b59508342536a70edd6c4be6cf4f2cb97f2e32cbc79ad2ab3f4c02933a4
uri: huggingface://Lewdiculous/Fimbulvetr-11B-v2-GGUF-IQ-Imatrix/Fimbulvetr-11B-v2-Q4_K_M-imat.gguf
-- &noromaid
- ### Start noromaid
+- &noromaid ### Start noromaid
url: "github:mudler/LocalAI/gallery/noromaid.yaml@master"
name: "noromaid-13b-0.4-DPO"
icon: https://cdn-uploads.huggingface.co/production/uploads/630dfb008df86f1e5becadc3/VKX2Z2yjZX5J8kXzgeCYO.png
@@ -8773,8 +8789,7 @@
- filename: Noromaid-13B-0.4-DPO.q4_k_m.gguf
sha256: cb28e878d034fae3d0b43326c5fc1cfb4ab583b17c56e41d6ce023caec03c1c1
uri: huggingface://NeverSleep/Noromaid-13B-0.4-DPO-GGUF/Noromaid-13B-0.4-DPO.q4_k_m.gguf
-- &wizardlm2
- ### START Vicuna based
+- &wizardlm2 ### START Vicuna based
url: "github:mudler/LocalAI/gallery/wizardlm2.yaml@master"
name: "wizardlm2-7b"
description: |
@@ -8829,8 +8844,7 @@
- filename: moondream2-mmproj-f16.gguf
sha256: 4cc1cb3660d87ff56432ebeb7884ad35d67c48c7b9f6b2856f305e39c38eed8f
uri: huggingface://moondream/moondream2-gguf/moondream2-mmproj-f16.gguf
-- &llava
- ### START LLaVa
+- &llava ### START LLaVa
url: "github:mudler/LocalAI/gallery/llava.yaml@master"
license: apache-2.0
description: |
@@ -9688,8 +9702,7 @@
- filename: Freyja-v4.95-maldv-7b-NON-FICTION.i1-Q4_K_M.gguf
sha256: cdc0f4de6df2ba120835fbd25c2a0ae2af8548f46d2c40c7a018c51c3d19e0c0
uri: huggingface://mradermacher/Freyja-v4.95-maldv-7b-NON-FICTION-i1-GGUF/Freyja-v4.95-maldv-7b-NON-FICTION.i1-Q4_K_M.gguf
-- &chatml
- ### ChatML
+- &chatml ### ChatML
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "una-thepitbull-21.4b-v2"
license: afl-3.0
@@ -9975,8 +9988,7 @@
- filename: Triangulum-10B.Q4_K_M.gguf
sha256: dd071f99edf6b166044bf229cdeec19419c4c348e3fc3d6587cfcc55e6fb85fa
uri: huggingface://mradermacher/Triangulum-10B-GGUF/Triangulum-10B.Q4_K_M.gguf
-- &command-R
- ### START Command-r
+- &command-R ### START Command-r
url: "github:mudler/LocalAI/gallery/command-r.yaml@master"
name: "command-r-v01:q1_s"
license: "cc-by-nc-4.0"
@@ -10031,8 +10043,7 @@
- filename: "aya-23-35B-Q4_K_M.gguf"
sha256: "57824768c1a945e21e028c8e9a29b39adb4838d489f5865c82601ab9ad98065d"
uri: "huggingface://bartowski/aya-23-35B-GGUF/aya-23-35B-Q4_K_M.gguf"
-- &phi-2-chat
- ### START Phi-2
+- &phi-2-chat ### START Phi-2
url: "github:mudler/LocalAI/gallery/phi-2-chat.yaml@master"
license: mit
description: |
@@ -10154,8 +10165,7 @@
- filename: internlm3-8b-instruct-Q4_K_M.gguf
uri: huggingface://bartowski/internlm3-8b-instruct-GGUF/internlm3-8b-instruct-Q4_K_M.gguf
sha256: 2a9644687318e8659c9cf9b40730d5cc2f5af06f786a50439c7c51359b23896e
-- &phi-3
- ### START Phi-3
+- &phi-3 ### START Phi-3
url: "github:mudler/LocalAI/gallery/phi-3-chat.yaml@master"
name: "phi-3-mini-4k-instruct"
icon: https://avatars.githubusercontent.com/u/6154722
@@ -10355,8 +10365,7 @@
- filename: Phi-3.5-MoE-instruct-Q4_K_M.gguf
sha256: 43e91bb720869bd8a92d8eb86bc3c74a52c49cf61642ca709b3d7bb89644df36
uri: huggingface://bartowski/Phi-3.5-MoE-instruct-GGUF/Phi-3.5-MoE-instruct-Q4_K_M.gguf
-- &hermes-2-pro-mistral
- ### START Hermes
+- &hermes-2-pro-mistral ### START Hermes
url: "github:mudler/LocalAI/gallery/hermes-2-pro-mistral.yaml@master"
name: "hermes-2-pro-mistral"
icon: https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ggO2sBDJ8Bhc6w-zwTx5j.png
@@ -10692,8 +10701,7 @@
- filename: "galatolo-Q4_K.gguf"
sha256: "ca0cfd5a9ad40dc16416aa3a277015d0299b62c0803b67f5709580042202c172"
uri: "huggingface://galatolo/cerbero-7b-gguf/ggml-model-Q4_K.gguf"
-- &codellama
- ### START Codellama
+- &codellama ### START Codellama
url: "github:mudler/LocalAI/gallery/codellama.yaml@master"
name: "codellama-7b"
license: llama2
@@ -10824,8 +10832,7 @@
- filename: "llm-compiler-7b-ftd.Q4_K.gguf"
uri: "huggingface://legraphista/llm-compiler-7b-ftd-IMat-GGUF/llm-compiler-7b-ftd.Q4_K.gguf"
sha256: d862dd18ed335413787d0ad196522a9902a3c10a6456afdab8721822cb0ddde8
-- &openvino
- ### START OpenVINO
+- &openvino ### START OpenVINO
url: "github:mudler/LocalAI/gallery/openvino.yaml@master"
name: "openvino-llama-3-8b-instruct-ov-int8"
license: llama3
@@ -10939,8 +10946,7 @@
- gpu
- embedding
- cpu
-- &sentencentransformers
- ### START Embeddings
+- &sentencentransformers ### START Embeddings
description: |
This framework provides an easy method to compute dense vector representations for sentences, paragraphs, and images. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various tasks. Text is embedded in vector space such that similar text are closer and can efficiently be found using cosine similarity.
urls:
@@ -10955,8 +10961,7 @@
overrides:
parameters:
model: all-MiniLM-L6-v2
-- &dreamshaper
- ### START Image generation
+- &dreamshaper ### START Image generation
name: dreamshaper
icon: https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/dd9b038c-bd15-43ab-86ab-66e145ad7ff2/width=450/26072158-132340247-8k%20portrait%20of%20beautiful%20cyborg%20with%20brown%20hair,%20intricate,%20elegant,%20highly%20detailed,%20majestic,%20digital%20photography,%20art%20by%20artg_ed.jpeg
license: other
@@ -11068,8 +11073,7 @@
- filename: t5xxl_fp16.safetensors
sha256: 6e480b09fae049a72d2a8c5fbccb8d3e92febeb233bbe9dfe7256958a9167635
uri: https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors
-- &whisper
- ## Whisper
+- &whisper ## Whisper
url: "github:mudler/LocalAI/gallery/whisper-base.yaml@master"
name: "whisper-1"
license: "MIT"
@@ -11249,8 +11253,7 @@
description: |
Stable Diffusion in NCNN with c++, supported txt2img and img2img
name: stablediffusion-cpp
-- &piper
- ## Piper TTS
+- &piper ## Piper TTS
url: github:mudler/LocalAI/gallery/piper.yaml@master
name: voice-en-us-kathleen-low
icon: https://github.com/rhasspy/piper/raw/master/etc/logo.png
From 83a8d90c52816832bd3362d6455501d479ce16ab Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 20 Jan 2025 10:50:29 +0100
Subject: [PATCH 061/679] chore(model gallery): add l3.3-70b-magnum-v4-se
(#4642)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 0397bd75..d10cd32e 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -419,6 +419,22 @@
- filename: L3.3-MS-Nevoria-70b-Q4_K_M.gguf
sha256: e8b0763f263089a19d4b112b7ed5085cc5f1ed9ca49c5085baa8d51f4ded1f94
uri: huggingface://bartowski/L3.3-MS-Nevoria-70b-GGUF/L3.3-MS-Nevoria-70b-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "l3.3-70b-magnum-v4-se"
+ urls:
+ - https://huggingface.co/Doctor-Shotgun/L3.3-70B-Magnum-v4-SE
+ - https://huggingface.co/bartowski/L3.3-70B-Magnum-v4-SE-GGUF
+ description: |
+ The Magnum v4 series is complete, but here's something a little extra I wanted to tack on as I wasn't entirely satisfied with the results of v4 72B. "SE" for Special Edition - this model is finetuned from meta-llama/Llama-3.3-70B-Instruct as an rsLoRA adapter. The dataset is a slightly revised variant of the v4 data with some elements of the v2 data re-introduced.
+
+ The objective, as with the other Magnum models, is to emulate the prose style and quality of the Claude 3 Sonnet/Opus series of models on a local scale, so don't be surprised to see "Claude-isms" in its output.
+ overrides:
+ parameters:
+ model: L3.3-70B-Magnum-v4-SE-Q4_K_M.gguf
+ files:
+ - filename: L3.3-70B-Magnum-v4-SE-Q4_K_M.gguf
+ sha256: 9724a6364a42caa3d5a1687258eb329c9af6cbb2ce01c8dd556c1a222a2e0352
+ uri: huggingface://bartowski/L3.3-70B-Magnum-v4-SE-GGUF/L3.3-70B-Magnum-v4-SE-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From aeb1dca52ef940ec23f3ffddc7af2cc9afac69a7 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 20 Jan 2025 11:03:35 +0100
Subject: [PATCH 062/679] chore(model gallery): add l3.3-prikol-70b-v0.2
(#4643)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index d10cd32e..679ab002 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -435,6 +435,27 @@
- filename: L3.3-70B-Magnum-v4-SE-Q4_K_M.gguf
sha256: 9724a6364a42caa3d5a1687258eb329c9af6cbb2ce01c8dd556c1a222a2e0352
uri: huggingface://bartowski/L3.3-70B-Magnum-v4-SE-GGUF/L3.3-70B-Magnum-v4-SE-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "l3.3-prikol-70b-v0.2"
+ icon: https://files.catbox.moe/x9t3zo.png
+ urls:
+ - https://huggingface.co/Nohobby/L3.3-Prikol-70B-v0.2
+ - https://huggingface.co/bartowski/L3.3-Prikol-70B-v0.2-GGUF
+ description: |
+ A merge of some Llama 3.3 models because um uh yeah
+
+ Went extra schizo on the recipe, hoping for an extra fun result, and... Well, I guess it's an overall improvement over the previous revision. It's a tiny bit smarter, has even more distinct swipes and nice dialogues, but for some reason it's damn sloppy.
+
+ I've published the second step of this merge as a separate model, and I'd say the results are more interesting, but not as usable as this one. https://huggingface.co/Nohobby/AbominationSnowPig
+
+ Prompt format: Llama3 OR Llama3 Context and ChatML Instruct. It actually works a bit better this way
+ overrides:
+ parameters:
+ model: L3.3-Prikol-70B-v0.2-Q4_K_M.gguf
+ files:
+ - filename: L3.3-Prikol-70B-v0.2-Q4_K_M.gguf
+ sha256: fc0ff514efbc0b67981c2bf1423d5a2e1b8801e4266ba0c653ea148414fe5ffc
+ uri: huggingface://bartowski/L3.3-Prikol-70B-v0.2-GGUF/L3.3-Prikol-70B-v0.2-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From a396040886fb5e2e13dee72811605956c7506ebc Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Mon, 20 Jan 2025 16:13:19 +0100
Subject: [PATCH 063/679] chore(model gallery): remove dead icons and update
LLAVA and DeepSeek ones (#4645)
* chore(model gallery): update icons and add LLAVA ones
Signed-off-by: Gianluca Boiano
* chore(model gallery): fix all complains related to yamllint
Signed-off-by: Gianluca Boiano
---------
Signed-off-by: Gianluca Boiano
---
gallery/index.yaml | 69 +++++++++++++++++++++-------------------------
1 file changed, 31 insertions(+), 38 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 679ab002..30687062 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -819,7 +819,7 @@
- filename: salamandra-7b-instruct.Q4_K_M-f32.gguf
sha256: bac8e8c1d1d9d53cbdb148b8ff9ad378ddb392429207099e85b5aae3a43bff3d
uri: huggingface://cstr/salamandra-7b-instruct-GGUF/salamandra-7b-instruct.Q4_K_M-f32.gguf
-- &llama32 ## llama3.2
+- &llama32 ## llama3.2
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
license: llama3.2
@@ -1354,7 +1354,7 @@
- filename: FineMath-Llama-3B-Q4_K_M.gguf
sha256: 16c73b5cf2a417a7e1608bcc9469f1461fc3e759ce04a3a337f48df977dc158c
uri: huggingface://bartowski/FineMath-Llama-3B-GGUF/FineMath-Llama-3B-Q4_K_M.gguf
-- &qwen25 ## Qwen2.5
+- &qwen25 ## Qwen2.5
name: "qwen2.5-14b-instruct"
icon: https://avatars.githubusercontent.com/u/141221163
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
@@ -2181,7 +2181,6 @@
sha256: 42cf7a96784dc8f25c61c2404620c3e6548a024caa8dff6e435d7c86400d7ab8
uri: huggingface://mradermacher/Qwen2.5-7B-nerd-uncensored-v1.7-GGUF/Qwen2.5-7B-nerd-uncensored-v1.7.Q4_K_M.gguf
- !!merge <<: *qwen25
- icon: https://i.imgur.com/OxX2Usi.png
name: "evathene-v1.0"
urls:
- https://huggingface.co/sophosympatheia/Evathene-v1.0
@@ -2540,7 +2539,6 @@
sha256: 91907f29746625a62885793475956220b81d8a5a34b53686a1acd1d03fd403ea
uri: huggingface://bartowski/72B-Qwen2.5-Kunou-v1-GGUF/72B-Qwen2.5-Kunou-v1-Q4_K_M.gguf
- !!merge <<: *qwen25
- icon: https://i.imgur.com/OxX2Usi.png
name: "evathene-v1.3"
urls:
- https://huggingface.co/sophosympatheia/Evathene-v1.3
@@ -3276,7 +3274,7 @@
- filename: DRT-o1-14B-Q4_K_M.gguf
sha256: 9619ca984cf4ce8e4f69bcde831de17b2ce05dd89536e3130608877521e3d328
uri: huggingface://bartowski/DRT-o1-14B-GGUF/DRT-o1-14B-Q4_K_M.gguf
-- &smollm ## SmolLM
+- &smollm ## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "smollm-1.7b-instruct"
icon: https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png
@@ -3334,7 +3332,7 @@
- filename: Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
sha256: eaeac314e30b461413bc1cc819cdc0cd6a79265711fd0b8268702960a082c7bd
uri: huggingface://QuantFactory/Vikhr-Qwen-2.5-1.5B-Instruct-GGUF/Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
-- &llama31 ## LLama3.1
+- &llama31 ## LLama3.1
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
name: "meta-llama-3.1-8b-instruct"
@@ -4485,7 +4483,6 @@
sha256: 27b10c3ca4507e8bf7d305d60e5313b54ef5fffdb43a03f36223d19d906e39f3
uri: huggingface://mradermacher/L3.1-70Blivion-v0.1-rc1-70B-i1-GGUF/L3.1-70Blivion-v0.1-rc1-70B.i1-Q4_K_M.gguf
- !!merge <<: *llama31
- icon: https://i.imgur.com/sdN0Aqg.jpeg
name: "llama-3.1-hawkish-8b"
urls:
- https://huggingface.co/mukaj/Llama-3.1-Hawkish-8B
@@ -5222,10 +5219,10 @@
- filename: Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf
sha256: 268390e07edd407ad93ea21a868b7ae995b5950e01cad0db9e1802ae5049d405
uri: huggingface://bartowski/Dolphin3.0-Llama3.1-8B-GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf
-- &deepseek ## Deepseek
+- &deepseek ## Deepseek
url: "github:mudler/LocalAI/gallery/deepseek.yaml@master"
name: "deepseek-coder-v2-lite-instruct"
- icon: "https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true"
+ icon: "https://avatars.githubusercontent.com/u/148330874"
license: deepseek
description: |
DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens sourced from a high-quality and multi-source corpus. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-Coder-V2-Base, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.
@@ -5287,7 +5284,7 @@
- filename: archangel_sft_pythia2-8b.Q4_K_M.gguf
sha256: a47782c55ef2b39b19644213720a599d9849511a73c9ebb0c1de749383c0a0f8
uri: huggingface://RichardErkhov/ContextualAI_-_archangel_sft_pythia2-8b-gguf/archangel_sft_pythia2-8b.Q4_K_M.gguf
-- &qwen2 ## Start QWEN2
+- &qwen2 ## Start QWEN2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "qwen2-7b-instruct"
icon: https://avatars.githubusercontent.com/u/141221163
@@ -5647,7 +5644,7 @@
- filename: minicpm-v-2_6-mmproj-f16.gguf
sha256: f8a805e9e62085805c69c427287acefc284932eb4abfe6e1b1ce431d27e2f4e0
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
-- &mistral03 ## START Mistral
+- &mistral03 ## START Mistral
url: "github:mudler/LocalAI/gallery/mistral-0.3.yaml@master"
name: "mistral-7b-instruct-v0.3"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/62dac1c7a8ead43d20e3e17a/wrLf5yaGC6ng4XME70w6Z.png
@@ -6155,7 +6152,6 @@
- !!merge <<: *mistral03
name: "mn-12b-mag-mell-r1-iq-arm-imatrix"
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
- icon: "https://i.imgur.com/wjyAaTO.png"
urls:
- https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1
- https://huggingface.co/Lewdiculous/MN-12B-Mag-Mell-R1-GGUF-IQ-ARM-Imatrix
@@ -6280,7 +6276,7 @@
- filename: Wayfarer-12B-Q4_K_M.gguf
sha256: 6cd9f290c820c64854fcdcfd312b066447acc2f63abe2e2e71af9bc4f1946c08
uri: huggingface://bartowski/Wayfarer-12B-GGUF/Wayfarer-12B-Q4_K_M.gguf
-- &mudler ### START mudler's LocalAI specific-models
+- &mudler ### START mudler's LocalAI specific-models
url: "github:mudler/LocalAI/gallery/mudler.yaml@master"
name: "LocalAI-llama3-8b-function-call-v0.2"
icon: "https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/us5JKi9z046p8K-cn_M0w.webp"
@@ -6325,7 +6321,7 @@
- filename: Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin
sha256: 579cbb229f9c11d0330759ff4733102d2491615a4c61289e26c09d1b3a583fec
uri: huggingface://mudler/Mirai-Nova-Llama3-LocalAI-8B-v0.1-GGUF/Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin
-- &parler-tts ### START parler-tts
+- &parler-tts ### START parler-tts
url: "github:mudler/LocalAI/gallery/parler-tts.yaml@master"
name: parler-tts-mini-v0.1
overrides:
@@ -6342,7 +6338,7 @@
- cpu
- text-to-speech
- python
-- &rerankers ### START rerankers
+- &rerankers ### START rerankers
url: "github:mudler/LocalAI/gallery/rerankers.yaml@master"
name: cross-encoder
parameters:
@@ -7265,10 +7261,9 @@
name: "l3-8b-stheno-v3.1"
urls:
- https://huggingface.co/Sao10K/L3-8B-Stheno-v3.1
- icon: https://w.forfun.com/fetch/cb/cba2205390e517bea1ea60ca0b491af4.jpeg
description: |
- A model made for 1-on-1 Roleplay ideally, but one that is able to handle scenarios, RPGs and storywriting fine.
- - Uncensored during actual roleplay scenarios. # I do not care for zero-shot prompting like what some people do. It is uncensored enough in actual usecases.
+ - Uncensored during actual roleplay scenarios. # I do not care for zero-shot prompting like what some people do. It is uncensored enough in actual usecases.
- I quite like the prose and style for this model.
overrides:
parameters:
@@ -8059,7 +8054,6 @@
urls:
- https://huggingface.co/bartowski/New-Dawn-Llama-3-70B-32K-v1.0-GGUF
- https://huggingface.co/sophosympatheia/New-Dawn-Llama-3-70B-32K-v1.0
- icon: https://imgur.com/tKzncGo.png
description: |
This model is a multi-level SLERP merge of several Llama 3 70B variants. See the merge recipe below for details. I extended the context window for this model out to 32K by snagging some layers from abacusai/Smaug-Llama-3-70B-Instruct-32K using a technique similar to what I used for Midnight Miqu, which was further honed by jukofyork.
This model is uncensored. You are responsible for whatever you do with it.
@@ -8411,7 +8405,8 @@
- filename: dolphin-2.9.2-Phi-3-Medium-abliterated-Q4_K_M.gguf
sha256: 566331c2efe87725310aacb709ca15088a0063fa0ddc14a345bf20d69982156b
uri: huggingface://bartowski/dolphin-2.9.2-Phi-3-Medium-abliterated-GGUF/dolphin-2.9.2-Phi-3-Medium-abliterated-Q4_K_M.gguf
-- url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+- !!merge <<: *llama3
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "llama-3-8b-instruct-dpo-v0.3-32k"
license: llama3
urls:
@@ -8595,7 +8590,7 @@
- filename: Copus-2x8B.i1-Q4_K_M.gguf
sha256: 685da1ba49e203e8f491105585143d76044286d4b4687bed37d325f6b55501e5
uri: huggingface://mradermacher/Copus-2x8B-i1-GGUF/Copus-2x8B.i1-Q4_K_M.gguf
-- &yi-chat ### Start Yi
+- &yi-chat ### Start Yi
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
icon: "https://github.com/01-ai/Yi/raw/main/assets/img/Yi_logo_icon_light.svg"
name: "yi-1.5-9b-chat"
@@ -8806,7 +8801,7 @@
- filename: Fimbulvetr-11B-v2-Q4_K_M-imat.gguf
sha256: 3f309b59508342536a70edd6c4be6cf4f2cb97f2e32cbc79ad2ab3f4c02933a4
uri: huggingface://Lewdiculous/Fimbulvetr-11B-v2-GGUF-IQ-Imatrix/Fimbulvetr-11B-v2-Q4_K_M-imat.gguf
-- &noromaid ### Start noromaid
+- &noromaid ### Start noromaid
url: "github:mudler/LocalAI/gallery/noromaid.yaml@master"
name: "noromaid-13b-0.4-DPO"
icon: https://cdn-uploads.huggingface.co/production/uploads/630dfb008df86f1e5becadc3/VKX2Z2yjZX5J8kXzgeCYO.png
@@ -8826,7 +8821,7 @@
- filename: Noromaid-13B-0.4-DPO.q4_k_m.gguf
sha256: cb28e878d034fae3d0b43326c5fc1cfb4ab583b17c56e41d6ce023caec03c1c1
uri: huggingface://NeverSleep/Noromaid-13B-0.4-DPO-GGUF/Noromaid-13B-0.4-DPO.q4_k_m.gguf
-- &wizardlm2 ### START Vicuna based
+- &wizardlm2 ### START Vicuna based
url: "github:mudler/LocalAI/gallery/wizardlm2.yaml@master"
name: "wizardlm2-7b"
description: |
@@ -8881,7 +8876,9 @@
- filename: moondream2-mmproj-f16.gguf
sha256: 4cc1cb3660d87ff56432ebeb7884ad35d67c48c7b9f6b2856f305e39c38eed8f
uri: huggingface://moondream/moondream2-gguf/moondream2-mmproj-f16.gguf
-- &llava ### START LLaVa
+- &llava ### START LLaVa
+ name: "llava-1.6-vicuna"
+ icon: https://github.com/lobehub/lobe-icons/raw/master/packages/static-png/dark/llava-color.png
url: "github:mudler/LocalAI/gallery/llava.yaml@master"
license: apache-2.0
description: |
@@ -8895,7 +8892,6 @@
- gpu
- llama2
- cpu
- name: "llava-1.6-vicuna"
overrides:
mmproj: mmproj-vicuna7b-f16.gguf
parameters:
@@ -9363,7 +9359,6 @@
June 18, 2024 Update, After extensive testing of the intermediate checkpoints, significant progress has been made.
The model is slowly ā I mean, really slowly ā unlearning its alignment. By significantly lowering the learning rate, I was able to visibly observe deep behavioral changes, this process is taking longer than anticipated, but it's going to be worth it. Estimated time to completion: 4 more days.. I'm pleased to report that in several tests, the model not only maintained its intelligence but actually showed a slight improvement, especially in terms of common sense. An intermediate checkpoint of this model was used to create invisietch/EtherealRainbow-v0.3-rc7, with promising results. Currently, it seems like I'm on the right track. I hope this model will serve as a solid foundation for further merges, whether for role-playing (RP) or for uncensoring. This approach also allows us to save on actual fine-tuning, thereby reducing our carbon footprint. The merge process takes just a few minutes of CPU time, instead of days of GPU work.
June 20, 2024 Update, Unaligning was partially successful, and the results are decent, but I am not fully satisfied. I decided to bite the bullet, and do a full finetune, god have mercy on my GPUs. I am also releasing the intermediate checkpoint of this model.
- icon: https://i.imgur.com/Kpk1PgZ.png
overrides:
parameters:
model: LLAMA-3_8B_Unaligned_Alpha-Q4_K_M.gguf
@@ -9389,7 +9384,6 @@
uri: huggingface://bartowski/L3-8B-Lunaris-v1-GGUF/L3-8B-Lunaris-v1-Q4_K_M.gguf
- !!merge <<: *llama3
name: "llama-3_8b_unaligned_alpha_rp_soup-i1"
- icon: https://i.imgur.com/pXcjpoV.png
urls:
- https://huggingface.co/SicariusSicariiStuff/LLAMA-3_8B_Unaligned_Alpha_RP_Soup
- https://huggingface.co/mradermacher/LLAMA-3_8B_Unaligned_Alpha_RP_Soup-i1-GGUF
@@ -9739,7 +9733,7 @@
- filename: Freyja-v4.95-maldv-7b-NON-FICTION.i1-Q4_K_M.gguf
sha256: cdc0f4de6df2ba120835fbd25c2a0ae2af8548f46d2c40c7a018c51c3d19e0c0
uri: huggingface://mradermacher/Freyja-v4.95-maldv-7b-NON-FICTION-i1-GGUF/Freyja-v4.95-maldv-7b-NON-FICTION.i1-Q4_K_M.gguf
-- &chatml ### ChatML
+- &chatml ### ChatML
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "una-thepitbull-21.4b-v2"
license: afl-3.0
@@ -9787,7 +9781,6 @@
sha256: 9c90f3a65332a03a6cbb563eee19c7586d9544f646ff9f33f7f1904b3d415ae2
uri: huggingface://nold/HelpingAI-9B-GGUF/HelpingAI-9B_Q4_K_M.gguf
- url: "github:mudler/LocalAI/gallery/chatml-hercules.yaml@master"
- icon: "https://tse3.mm.bing.net/th/id/OIG1.vnrl3xpEcypR3McLW63q?pid=ImgGn"
urls:
- https://huggingface.co/Locutusque/Llama-3-Hercules-5.0-8B
- https://huggingface.co/bartowski/Llama-3-Hercules-5.0-8B-GGUF
@@ -10025,7 +10018,7 @@
- filename: Triangulum-10B.Q4_K_M.gguf
sha256: dd071f99edf6b166044bf229cdeec19419c4c348e3fc3d6587cfcc55e6fb85fa
uri: huggingface://mradermacher/Triangulum-10B-GGUF/Triangulum-10B.Q4_K_M.gguf
-- &command-R ### START Command-r
+- &command-R ### START Command-r
url: "github:mudler/LocalAI/gallery/command-r.yaml@master"
name: "command-r-v01:q1_s"
license: "cc-by-nc-4.0"
@@ -10080,7 +10073,7 @@
- filename: "aya-23-35B-Q4_K_M.gguf"
sha256: "57824768c1a945e21e028c8e9a29b39adb4838d489f5865c82601ab9ad98065d"
uri: "huggingface://bartowski/aya-23-35B-GGUF/aya-23-35B-Q4_K_M.gguf"
-- &phi-2-chat ### START Phi-2
+- &phi-2-chat ### START Phi-2
url: "github:mudler/LocalAI/gallery/phi-2-chat.yaml@master"
license: mit
description: |
@@ -10202,7 +10195,7 @@
- filename: internlm3-8b-instruct-Q4_K_M.gguf
uri: huggingface://bartowski/internlm3-8b-instruct-GGUF/internlm3-8b-instruct-Q4_K_M.gguf
sha256: 2a9644687318e8659c9cf9b40730d5cc2f5af06f786a50439c7c51359b23896e
-- &phi-3 ### START Phi-3
+- &phi-3 ### START Phi-3
url: "github:mudler/LocalAI/gallery/phi-3-chat.yaml@master"
name: "phi-3-mini-4k-instruct"
icon: https://avatars.githubusercontent.com/u/6154722
@@ -10402,7 +10395,7 @@
- filename: Phi-3.5-MoE-instruct-Q4_K_M.gguf
sha256: 43e91bb720869bd8a92d8eb86bc3c74a52c49cf61642ca709b3d7bb89644df36
uri: huggingface://bartowski/Phi-3.5-MoE-instruct-GGUF/Phi-3.5-MoE-instruct-Q4_K_M.gguf
-- &hermes-2-pro-mistral ### START Hermes
+- &hermes-2-pro-mistral ### START Hermes
url: "github:mudler/LocalAI/gallery/hermes-2-pro-mistral.yaml@master"
name: "hermes-2-pro-mistral"
icon: https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ggO2sBDJ8Bhc6w-zwTx5j.png
@@ -10738,7 +10731,7 @@
- filename: "galatolo-Q4_K.gguf"
sha256: "ca0cfd5a9ad40dc16416aa3a277015d0299b62c0803b67f5709580042202c172"
uri: "huggingface://galatolo/cerbero-7b-gguf/ggml-model-Q4_K.gguf"
-- &codellama ### START Codellama
+- &codellama ### START Codellama
url: "github:mudler/LocalAI/gallery/codellama.yaml@master"
name: "codellama-7b"
license: llama2
@@ -10869,7 +10862,7 @@
- filename: "llm-compiler-7b-ftd.Q4_K.gguf"
uri: "huggingface://legraphista/llm-compiler-7b-ftd-IMat-GGUF/llm-compiler-7b-ftd.Q4_K.gguf"
sha256: d862dd18ed335413787d0ad196522a9902a3c10a6456afdab8721822cb0ddde8
-- &openvino ### START OpenVINO
+- &openvino ### START OpenVINO
url: "github:mudler/LocalAI/gallery/openvino.yaml@master"
name: "openvino-llama-3-8b-instruct-ov-int8"
license: llama3
@@ -10983,7 +10976,7 @@
- gpu
- embedding
- cpu
-- &sentencentransformers ### START Embeddings
+- &sentencentransformers ### START Embeddings
description: |
This framework provides an easy method to compute dense vector representations for sentences, paragraphs, and images. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various tasks. Text is embedded in vector space such that similar text are closer and can efficiently be found using cosine similarity.
urls:
@@ -10998,7 +10991,7 @@
overrides:
parameters:
model: all-MiniLM-L6-v2
-- &dreamshaper ### START Image generation
+- &dreamshaper ### START Image generation
name: dreamshaper
icon: https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/dd9b038c-bd15-43ab-86ab-66e145ad7ff2/width=450/26072158-132340247-8k%20portrait%20of%20beautiful%20cyborg%20with%20brown%20hair,%20intricate,%20elegant,%20highly%20detailed,%20majestic,%20digital%20photography,%20art%20by%20artg_ed.jpeg
license: other
@@ -11110,7 +11103,7 @@
- filename: t5xxl_fp16.safetensors
sha256: 6e480b09fae049a72d2a8c5fbccb8d3e92febeb233bbe9dfe7256958a9167635
uri: https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors
-- &whisper ## Whisper
+- &whisper ## Whisper
url: "github:mudler/LocalAI/gallery/whisper-base.yaml@master"
name: "whisper-1"
license: "MIT"
@@ -11290,7 +11283,7 @@
description: |
Stable Diffusion in NCNN with c++, supported txt2img and img2img
name: stablediffusion-cpp
-- &piper ## Piper TTS
+- &piper ## Piper TTS
url: github:mudler/LocalAI/gallery/piper.yaml@master
name: voice-en-us-kathleen-low
icon: https://github.com/rhasspy/piper/raw/master/etc/logo.png
From 2f09aa1b850535d2cb820a49c19c9159867c1f0b Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 20 Jan 2025 19:04:23 +0100
Subject: [PATCH 064/679] chore(model gallery): add sd-3.5-large-ggml (#4647)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 30 ++++++++++++++++++++++++++++++
gallery/sd-ggml.yaml | 12 ++++++++++++
2 files changed, 42 insertions(+)
create mode 100644 gallery/sd-ggml.yaml
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 30687062..bcb7866a 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -11028,6 +11028,36 @@
- sd-3
- gpu
url: "github:mudler/LocalAI/gallery/stablediffusion3.yaml@master"
+- name: sd-3.5-large-ggml
+ license: stabilityai-ai-community
+ url: "github:mudler/LocalAI/gallery/sd-ggml.yaml@master"
+ description: |
+ Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
+ urls:
+ - https://huggingface.co/stabilityai/stable-diffusion-3.5-large
+ - https://huggingface.co/second-state/stable-diffusion-3.5-large-GGUF
+ tags:
+ - text-to-image
+ - flux
+ - gpu
+ - cpu
+ icon: https://huggingface.co/stabilityai/stable-diffusion-3.5-large/media/main/sd3.5_large_demo.png
+ overrides:
+ parameters:
+ model: sd3.5_large-Q4_0.gguf
+ files:
+ - filename: "sd3.5_large-Q4_0.gguf"
+ sha256: "c79ed6cdaa7decaca6b05ccc636b956b37c47de9b104c56315ca8ed086347b00"
+ uri: "huggingface://second-state/stable-diffusion-3.5-large-GGUF/sd3.5_large-Q4_0.gguf"
+ - filename: clip_g.safetensors
+ sha256: ec310df2af79c318e24d20511b601a591ca8cd4f1fce1d8dff822a356bcdb1f4
+ uri: huggingface://second-state/stable-diffusion-3.5-large-GGUF/clip_g.safetensors
+ - filename: clip_l.safetensors
+ sha256: 660c6f5b1abae9dc498ac2d21e1347d2abdb0cf6c0c0c8576cd796491d9a6cdd
+ uri: huggingface://second-state/stable-diffusion-3.5-large-GGUF/clip_l.safetensors
+ - filename: t5xxl-Q5_0.gguf
+ sha256: f4df16c641a05c4a6ca717068ba3ee312875000f6fac0efbd152915553b5fc3e
+ uri: huggingface://second-state/stable-diffusion-3.5-large-GGUF/t5xxl-Q5_0.gguf
- &flux
name: flux.1-dev
license: flux-1-dev-non-commercial-license
diff --git a/gallery/sd-ggml.yaml b/gallery/sd-ggml.yaml
new file mode 100644
index 00000000..d819eba8
--- /dev/null
+++ b/gallery/sd-ggml.yaml
@@ -0,0 +1,12 @@
+---
+name: "sd-ggml"
+
+config_file: |
+ backend: stablediffusion-ggml
+ step: 25
+ cfg_scale: 4.5
+ options:
+ - "clip_l_path:clip_l.safetensors"
+ - "clip_g_path:clip_g.safetensors"
+ - "t5xxl_path:t5xxl-Q5_0.gguf"
+ - "sampler:euler"
From 14a1e02f4478cef20d723f9fa91f0645c856b7c8 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 20 Jan 2025 23:33:40 +0000
Subject: [PATCH 065/679] chore(deps): Bump docs/themes/hugo-theme-relearn from
`80e448e` to `8dad5ee` (#4656)
chore(deps): Bump docs/themes/hugo-theme-relearn
Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `80e448e` to `8dad5ee`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](https://github.com/McShelby/hugo-theme-relearn/compare/80e448e5bdaa92c87ee0d0d86f1125c8606ebf5f...8dad5ee419e5bb2a0b380aa72d7a7389af4945f6)
---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
dependency-type: direct:production
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
docs/themes/hugo-theme-relearn | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/themes/hugo-theme-relearn b/docs/themes/hugo-theme-relearn
index 80e448e5..8dad5ee4 160000
--- a/docs/themes/hugo-theme-relearn
+++ b/docs/themes/hugo-theme-relearn
@@ -1 +1 @@
-Subproject commit 80e448e5bdaa92c87ee0d0d86f1125c8606ebf5f
+Subproject commit 8dad5ee419e5bb2a0b380aa72d7a7389af4945f6
From 1a08948e63ce48dd32524cf4f7df88e6b69e639d Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Tue, 21 Jan 2025 08:37:13 +0100
Subject: [PATCH 066/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`aea8ddd5165d525a449e2fc3839db77a71f4a318` (#4657)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 7aaad492..53e5af7e 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=92bc493917d43b83e592349e138b54c90b1c3ea7
+CPPLLAMA_VERSION?=aea8ddd5165d525a449e2fc3839db77a71f4a318
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From b264a91b3f24ed8b2ec4c3161a8405be4e7019ad Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Tue, 21 Jan 2025 10:37:05 +0100
Subject: [PATCH 067/679] chore(model gallery): add Deepseek-R1-Distill models
(#4646)
* chore(model gallery): add Deepseek-R1-Distill-Llama-8b
Signed-off-by: Gianluca Boiano
* chore(model gallery): add Deepseek-R1-Distill-Qwen-1.5b
Signed-off-by: Gianluca Boiano
---------
Signed-off-by: Gianluca Boiano
---
gallery/index.yaml | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index bcb7866a..126bd14a 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -2696,6 +2696,23 @@
- filename: Qwentile2.5-32B-Instruct-Q4_K_M.gguf
sha256: e476d6e3c15c78fc3f986d7ae8fa35c16116843827f2e6243c05767cef2f3615
uri: huggingface://bartowski/Qwentile2.5-32B-Instruct-GGUF/Qwentile2.5-32B-Instruct-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "deepseek-r1-distill-qwen-1.5b"
+ icon: "https://avatars.githubusercontent.com/u/148330874"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5b
+ - https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF
+ description: |
+ DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks.
+ Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing.
+ By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.
+ overrides:
+ parameters:
+ model: deepseek-r1-distill-qwen-1.5b-Q4_K_M.gguf
+ files:
+ - filename: deepseek-r1-distill-qwen-1.5b-Q4_K_M.gguf
+ sha256: c2c43b6018cf7700ce0ddee8807deb1a9a26758ef878232f3a142d16df81f0fe
+ uri: huggingface://unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
- &archfunct
license: apache-2.0
tags:
@@ -5219,6 +5236,23 @@
- filename: Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf
sha256: 268390e07edd407ad93ea21a868b7ae995b5950e01cad0db9e1802ae5049d405
uri: huggingface://bartowski/Dolphin3.0-Llama3.1-8B-GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf
+- !!merge <<: *llama31
+ name: "deepseek-r1-distill-llama-8b"
+ icon: "https://avatars.githubusercontent.com/u/148330874"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+ - https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF
+ description: |
+ DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks.
+ Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing.
+ By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.
+ overrides:
+ parameters:
+ model: deepseek-r1-distill-llama-8b-Q4_K_M.gguf
+ files:
+ - filename: deepseek-r1-distill-llama-8b-Q4_K_M.gguf
+ sha256: f8eba201522ab44b79bc54166126bfaf836111ff4cbf2d13c59c3b57da10573b
+ uri: huggingface://unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf
- &deepseek ## Deepseek
url: "github:mudler/LocalAI/gallery/deepseek.yaml@master"
name: "deepseek-coder-v2-lite-instruct"
From 6831719e1e74f5ed0f58c40999bce9a8f4066959 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 21 Jan 2025 15:09:36 +0100
Subject: [PATCH 068/679] chore(model gallery): add deepseek-r1-distill-qwen-7b
(#4660)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 126bd14a..c56e37b1 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -2713,6 +2713,22 @@
- filename: deepseek-r1-distill-qwen-1.5b-Q4_K_M.gguf
sha256: c2c43b6018cf7700ce0ddee8807deb1a9a26758ef878232f3a142d16df81f0fe
uri: huggingface://unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "deepseek-r1-distill-qwen-7b"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+ - https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF
+ description: |
+ DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks.
+ Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing.
+ By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.
+ overrides:
+ parameters:
+ model: DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
+ files:
+ - filename: DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
+ sha256: 731ece8d06dc7eda6f6572997feb9ee1258db0784827e642909d9b565641937b
+ uri: huggingface://bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
- &archfunct
license: apache-2.0
tags:
From e81ceff6812c43c401c110eafbcc140747266ea2 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Tue, 21 Jan 2025 23:04:29 +0100
Subject: [PATCH 069/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`6171c9d25820ccf676b243c172868819d882848f` (#4661)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 53e5af7e..44959fd3 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=aea8ddd5165d525a449e2fc3839db77a71f4a318
+CPPLLAMA_VERSION?=6171c9d25820ccf676b243c172868819d882848f
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 0ec25b8b0743416a7ddd6f66f09dc1d1dd7fe07f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 22 Jan 2025 16:37:20 +0100
Subject: [PATCH 070/679] chore(model gallery): add sd-1.5-ggml and
sd-3.5-medium-ggml (#4664)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 58 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 57 insertions(+), 1 deletion(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index c56e37b1..4ce19bb4 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -11078,6 +11078,62 @@
- sd-3
- gpu
url: "github:mudler/LocalAI/gallery/stablediffusion3.yaml@master"
+- name: sd-1.5-ggml
+ license: creativeml-openrail-m
+ url: "github:mudler/LocalAI/gallery/sd-ggml.yaml@master"
+ description: |
+ Stable Diffusion 1.5
+ urls:
+ - https://huggingface.co/second-state/stable-diffusion-v1-5-GGUF
+ tags:
+ - text-to-image
+ - stablediffusion
+ - gpu
+ - cpu
+ overrides:
+ options:
+ - "sampler:euler"
+ parameters:
+ model: stable-diffusion-v1-5-pruned-emaonly-Q4_0.gguf
+ files:
+ - filename: "stable-diffusion-v1-5-pruned-emaonly-Q4_0.gguf"
+ sha256: "b8944e9fe0b69b36ae1b5bb0185b3a7b8ef14347fe0fa9af6c64c4829022261f"
+ uri: "huggingface://second-state/stable-diffusion-v1-5-GGUF/stable-diffusion-v1-5-pruned-emaonly-Q4_0.gguf"
+- name: sd-3.5-medium-ggml
+ license: stabilityai-ai-community
+ url: "github:mudler/LocalAI/gallery/sd-ggml.yaml@master"
+ description: |
+ Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
+ urls:
+ - https://huggingface.co/stabilityai/stable-diffusion-3.5-medium
+ - https://huggingface.co/second-state/stable-diffusion-3.5-medium-GGUF
+ tags:
+ - text-to-image
+ - stablediffusion
+ - gpu
+ - cpu
+ icon: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/media/main/sd3.5_medium_demo.jpg
+ overrides:
+ options:
+ - "clip_l_path:clip_l-Q4_0.gguf"
+ - "clip_g_path:clip_g-Q4_0.gguf"
+ - "t5xxl_path:t5xxl-Q4_0.gguf"
+ - "sampler:euler"
+ parameters:
+ model: sd3.5_medium-Q4_0.gguf
+ files:
+ - filename: "sd3.5_medium-Q4_0.gguf"
+ sha256: "3bb8c5e9ab0a841117089ed4ed81d885bb85161df2a766b812f829bc55b31adf"
+ uri: "huggingface://second-state/stable-diffusion-3.5-medium-GGUF/sd3.5_medium-Q4_0.gguf"
+ - filename: clip_g-Q4_0.gguf
+ sha256: c142411147e16b7c4b9cc1f5d977cbe596104435d76fde47172d3d35c5e58bb8
+ uri: huggingface://second-state/stable-diffusion-3.5-medium-GGUF/clip_g-Q4_0.gguf
+ - filename: clip_l-Q4_0.gguf
+ sha256: f5ad88ae2ac924eb4ac0298b77afa304b5e6014fc0c4128f0e3df40fdfcc0f8a
+ uri: huggingface://second-state/stable-diffusion-3.5-medium-GGUF/clip_l-Q4_0.gguf
+ - filename: t5xxl-Q4_0.gguf
+ sha256: 987ba47c158b890c274f78fd35324419f50941e846a49789f0977e9fe9d97ab7
+ uri: huggingface://second-state/stable-diffusion-3.5-medium-GGUF/t5xxl-Q4_0.gguf
- name: sd-3.5-large-ggml
license: stabilityai-ai-community
url: "github:mudler/LocalAI/gallery/sd-ggml.yaml@master"
@@ -11088,7 +11144,7 @@
- https://huggingface.co/second-state/stable-diffusion-3.5-large-GGUF
tags:
- text-to-image
- - flux
+ - stablediffusion
- gpu
- cpu
icon: https://huggingface.co/stabilityai/stable-diffusion-3.5-large/media/main/sd3.5_large_demo.png
From 10675ac28e80e990832c650174efec0e0d006838 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 22 Jan 2025 18:07:30 +0100
Subject: [PATCH 071/679] Update README.md
Signed-off-by: Ettore Di Giacinto
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 4d415d16..78267e04 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@
-
+
> :bulb: Get help - [āFAQ](https://localai.io/faq/) [šDiscussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/)
From e15d29aba2982d07cb2bfec9267c076d73eab2b5 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 22 Jan 2025 19:34:16 +0100
Subject: [PATCH 072/679] chore(stablediffusion-ncn): drop in favor of ggml
implementation (#4652)
* chore(stablediffusion-ncn): drop in favor of ggml implementation
Signed-off-by: Ettore Di Giacinto
* chore(ci): drop stablediffusion build
Signed-off-by: Ettore Di Giacinto
* chore(tests): add
Signed-off-by: Ettore Di Giacinto
* chore(tests): try to fixup current tests
Signed-off-by: Ettore Di Giacinto
* Try to fix tests
Signed-off-by: Ettore Di Giacinto
* Tests improvements
Signed-off-by: Ettore Di Giacinto
* chore(tests): use quality to specify step
Signed-off-by: Ettore Di Giacinto
* chore(tests): switch to sd-1.5
also increase prep time for downloading models
Signed-off-by: Ettore Di Giacinto
---------
Signed-off-by: Ettore Di Giacinto
---
.devcontainer/docker-compose-devcontainer.yml | 2 +-
.env | 6 +-
.github/workflows/release.yaml | 35 +----------
.github/workflows/test.yml | 6 +-
.vscode/launch.json | 2 +-
Dockerfile | 38 +-----------
Makefile | 36 +----------
aio/cpu/image-gen.yaml | 59 +++---------------
backend/go/image/stablediffusion/main.go | 21 -------
.../image/stablediffusion/stablediffusion.go | 33 ----------
core/config/backend_config.go | 2 +-
core/config/config_test.go | 61 +++++++++++++++++++
core/http/app_test.go | 17 +++---
core/http/endpoints/openai/image.go | 6 +-
core/http/endpoints/openai/request.go | 9 +++
core/schema/openai.go | 5 +-
pkg/model/initializers.go | 9 +--
pkg/stablediffusion/generate.go | 35 -----------
pkg/stablediffusion/generate_unsupported.go | 10 ---
pkg/stablediffusion/stablediffusion.go | 20 ------
tests/e2e-aio/e2e_suite_test.go | 2 +-
tests/e2e-aio/e2e_test.go | 11 ++--
22 files changed, 123 insertions(+), 302 deletions(-)
delete mode 100644 backend/go/image/stablediffusion/main.go
delete mode 100644 backend/go/image/stablediffusion/stablediffusion.go
delete mode 100644 pkg/stablediffusion/generate.go
delete mode 100644 pkg/stablediffusion/generate_unsupported.go
delete mode 100644 pkg/stablediffusion/stablediffusion.go
diff --git a/.devcontainer/docker-compose-devcontainer.yml b/.devcontainer/docker-compose-devcontainer.yml
index 8795d64d..7ef22099 100644
--- a/.devcontainer/docker-compose-devcontainer.yml
+++ b/.devcontainer/docker-compose-devcontainer.yml
@@ -7,7 +7,7 @@ services:
args:
- FFMPEG=true
- IMAGE_TYPE=extras
- - GO_TAGS=stablediffusion p2p tts
+ - GO_TAGS=p2p tts
env_file:
- ../.env
ports:
diff --git a/.env b/.env
index e92f7f3b..ee8db74e 100644
--- a/.env
+++ b/.env
@@ -38,12 +38,12 @@
## Uncomment and set to true to enable rebuilding from source
# REBUILD=true
-## Enable go tags, available: stablediffusion, tts
-## stablediffusion: image generation with stablediffusion
+## Enable go tags, available: p2p, tts
+## p2p: enable distributed inferencing
## tts: enables text-to-speech with go-piper
## (requires REBUILD=true)
#
-# GO_TAGS=stablediffusion
+# GO_TAGS=p2p
## Path where to store generated images
# LOCALAI_IMAGE_PATH=/tmp/generated/images
diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml
index 47a69b0f..e133ecb6 100644
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -237,40 +237,7 @@ jobs:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true
- build-stablediffusion:
- runs-on: ubuntu-latest
- steps:
- - name: Clone
- uses: actions/checkout@v4
- with:
- submodules: true
- - uses: actions/setup-go@v5
- with:
- go-version: '1.21.x'
- cache: false
- - name: Dependencies
- run: |
- sudo apt-get update
- sudo apt-get install -y --no-install-recommends libopencv-dev protobuf-compiler ccache upx-ucl
- go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
- go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
- - name: Build stablediffusion
- run: |
- export PATH=$PATH:$GOPATH/bin
- make backend-assets/grpc/stablediffusion
- mkdir -p release && cp backend-assets/grpc/stablediffusion release
- env:
- GO_TAGS: stablediffusion
- - uses: actions/upload-artifact@v4
- with:
- name: stablediffusion
- path: release/
- - name: Release
- uses: softprops/action-gh-release@v2
- if: startsWith(github.ref, 'refs/tags/')
- with:
- files: |
- release/*
+
build-macOS-x86_64:
runs-on: macos-13
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index 0ee93afa..444c89fb 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -105,9 +105,7 @@ jobs:
# Pre-build piper before we start tests in order to have shared libraries in place
make sources/go-piper && \
GO_TAGS="tts" make -C sources/go-piper piper.o && \
- sudo cp -rfv sources/go-piper/piper-phonemize/pi/lib/. /usr/lib/ && \
- # Pre-build stable diffusion before we install a newer version of abseil (not compatible with stablediffusion-ncn)
- PATH="$PATH:/root/go/bin" GO_TAGS="stablediffusion tts" GRPC_BACKENDS=backend-assets/grpc/stablediffusion make build
+ sudo cp -rfv sources/go-piper/piper-phonemize/pi/lib/. /usr/lib/
env:
CUDA_VERSION: 12-4
- name: Cache grpc
@@ -129,7 +127,7 @@ jobs:
cd grpc && cd cmake/build && sudo make --jobs 5 install
- name: Test
run: |
- PATH="$PATH:/root/go/bin" GO_TAGS="stablediffusion tts" make --jobs 5 --output-sync=target test
+ PATH="$PATH:/root/go/bin" GO_TAGS="tts" make --jobs 5 --output-sync=target test
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.19
diff --git a/.vscode/launch.json b/.vscode/launch.json
index 50493421..f5e91508 100644
--- a/.vscode/launch.json
+++ b/.vscode/launch.json
@@ -26,7 +26,7 @@
"LOCALAI_P2P": "true",
"LOCALAI_FEDERATED": "true"
},
- "buildFlags": ["-tags", "stablediffusion p2p tts", "-v"],
+ "buildFlags": ["-tags", "p2p tts", "-v"],
"envFile": "${workspaceFolder}/.env",
"cwd": "${workspaceRoot}"
}
diff --git a/Dockerfile b/Dockerfile
index 4ddc921d..8594c2a1 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -69,14 +69,10 @@ ENV PATH=/opt/rocm/bin:${PATH}
# OpenBLAS requirements and stable diffusion
RUN apt-get update && \
apt-get install -y --no-install-recommends \
- libopenblas-dev \
- libopencv-dev && \
+ libopenblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
-# Set up OpenCV
-RUN ln -s /usr/include/opencv4/opencv2 /usr/include/opencv2
-
WORKDIR /build
###################################
@@ -251,7 +247,7 @@ RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shall
FROM requirements-drivers AS builder-base
-ARG GO_TAGS="stablediffusion tts p2p"
+ARG GO_TAGS="tts p2p"
ARG GRPC_BACKENDS
ARG MAKEFLAGS
ARG LD_FLAGS="-s -w"
@@ -285,35 +281,12 @@ RUN < 512 || width > 512 {
- return stableDiffusion.GenerateImageUpscaled(
- height,
- width,
- step,
- seed,
- positive_prompt,
- negative_prompt,
- dst,
- asset_dir,
- )
- }
- return stableDiffusion.GenerateImage(
- height,
- width,
- mode,
- step,
- seed,
- positive_prompt,
- negative_prompt,
- dst,
- "",
- asset_dir,
- )
-}
diff --git a/pkg/stablediffusion/generate_unsupported.go b/pkg/stablediffusion/generate_unsupported.go
deleted file mode 100644
index 9563bae0..00000000
--- a/pkg/stablediffusion/generate_unsupported.go
+++ /dev/null
@@ -1,10 +0,0 @@
-//go:build !stablediffusion
-// +build !stablediffusion
-
-package stablediffusion
-
-import "fmt"
-
-func GenerateImage(height, width, mode, step, seed int, positive_prompt, negative_prompt, dst, asset_dir string) error {
- return fmt.Errorf("This version of LocalAI was built without the stablediffusion tag")
-}
diff --git a/pkg/stablediffusion/stablediffusion.go b/pkg/stablediffusion/stablediffusion.go
deleted file mode 100644
index e38db17f..00000000
--- a/pkg/stablediffusion/stablediffusion.go
+++ /dev/null
@@ -1,20 +0,0 @@
-package stablediffusion
-
-import "os"
-
-type StableDiffusion struct {
- assetDir string
-}
-
-func New(assetDir string) (*StableDiffusion, error) {
- if _, err := os.Stat(assetDir); err != nil {
- return nil, err
- }
- return &StableDiffusion{
- assetDir: assetDir,
- }, nil
-}
-
-func (s *StableDiffusion) GenerateImage(height, width, mode, step, seed int, positive_prompt, negative_prompt, dst string) error {
- return GenerateImage(height, width, mode, step, seed, positive_prompt, negative_prompt, dst, s.assetDir)
-}
diff --git a/tests/e2e-aio/e2e_suite_test.go b/tests/e2e-aio/e2e_suite_test.go
index 680bd3a5..4a10d41b 100644
--- a/tests/e2e-aio/e2e_suite_test.go
+++ b/tests/e2e-aio/e2e_suite_test.go
@@ -54,7 +54,7 @@ var _ = BeforeSuite(func() {
Eventually(func() error {
_, err := client.ListModels(context.TODO())
return err
- }, "20m").ShouldNot(HaveOccurred())
+ }, "50m").ShouldNot(HaveOccurred())
})
var _ = AfterSuite(func() {
diff --git a/tests/e2e-aio/e2e_test.go b/tests/e2e-aio/e2e_test.go
index a9c55497..4d9eb4d8 100644
--- a/tests/e2e-aio/e2e_test.go
+++ b/tests/e2e-aio/e2e_test.go
@@ -123,8 +123,9 @@ var _ = Describe("E2E test", func() {
It("correctly", func() {
resp, err := client.CreateImage(context.TODO(),
openai.ImageRequest{
- Prompt: "test",
- Size: openai.CreateImageSize512x512,
+ Prompt: "test",
+ Quality: "1",
+ Size: openai.CreateImageSize256x256,
},
)
Expect(err).ToNot(HaveOccurred())
@@ -135,7 +136,8 @@ var _ = Describe("E2E test", func() {
resp, err := client.CreateImage(context.TODO(),
openai.ImageRequest{
Prompt: "test",
- Size: openai.CreateImageSize512x512,
+ Size: openai.CreateImageSize256x256,
+ Quality: "1",
ResponseFormat: openai.CreateImageResponseFormatURL,
},
)
@@ -147,7 +149,8 @@ var _ = Describe("E2E test", func() {
resp, err := client.CreateImage(context.TODO(),
openai.ImageRequest{
Prompt: "test",
- Size: openai.CreateImageSize512x512,
+ Size: openai.CreateImageSize256x256,
+ Quality: "1",
ResponseFormat: openai.CreateImageResponseFormatB64JSON,
},
)
From e8eb0b2c50a7653c9d8dc3e2388eb4074705b4b7 Mon Sep 17 00:00:00 2001
From: Richard Palethorpe
Date: Wed, 22 Jan 2025 18:35:05 +0000
Subject: [PATCH 073/679] fix(stores): Stores fixes and testing (#4663)
* fix(stores): Actually check a vector is a unit vector/normalized
Instead of just summing the components to see if they equal 1.0, take
the actual magnitude/p-norm of the vector and check that is
approximately 1.0.
Note that this shouldn't change the order of results except in edge
cases if I am too lax with the precision of the equality
comparison. However it should improve performance for normalized
vectors which were being misclassified.
Signed-off-by: Richard Palethorpe
* fix(stores): Add tests for known results and triangle inequality
This adds some more tests to check the cosine similarity function has
some expected mathematical properties.
Signed-off-by: Richard Palethorpe
---------
Signed-off-by: Richard Palethorpe
---
backend/go/stores/store.go | 14 +--
tests/integration/stores_test.go | 143 ++++++++++++++++++++++++++++---
2 files changed, 141 insertions(+), 16 deletions(-)
diff --git a/backend/go/stores/store.go b/backend/go/stores/store.go
index a4849b57..c8788a9c 100644
--- a/backend/go/stores/store.go
+++ b/backend/go/stores/store.go
@@ -311,12 +311,16 @@ func (s *Store) StoresGet(opts *pb.StoresGetOptions) (pb.StoresGetResult, error)
}
func isNormalized(k []float32) bool {
- var sum float32
+ var sum float64
+
for _, v := range k {
- sum += v
+ v64 := float64(v)
+ sum += v64*v64
}
- return sum == 1.0
+ s := math.Sqrt(sum)
+
+ return s >= 0.99 && s <= 1.01
}
// TODO: This we could replace with handwritten SIMD code
@@ -328,7 +332,7 @@ func normalizedCosineSimilarity(k1, k2 []float32) float32 {
dot += k1[i] * k2[i]
}
- assert(dot >= -1 && dot <= 1, fmt.Sprintf("dot = %f", dot))
+ assert(dot >= -1.01 && dot <= 1.01, fmt.Sprintf("dot = %f", dot))
// 2.0 * (1.0 - dot) would be the Euclidean distance
return dot
@@ -418,7 +422,7 @@ func cosineSimilarity(k1, k2 []float32, mag1 float64) float32 {
sim := float32(dot / (mag1 * math.Sqrt(mag2)))
- assert(sim >= -1 && sim <= 1, fmt.Sprintf("sim = %f", sim))
+ assert(sim >= -1.01 && sim <= 1.01, fmt.Sprintf("sim = %f", sim))
return sim
}
diff --git a/tests/integration/stores_test.go b/tests/integration/stores_test.go
index 5ed46b19..9612bec0 100644
--- a/tests/integration/stores_test.go
+++ b/tests/integration/stores_test.go
@@ -4,6 +4,7 @@ import (
"context"
"embed"
"math"
+ "math/rand"
"os"
"path/filepath"
@@ -22,6 +23,19 @@ import (
//go:embed backend-assets/*
var backendAssets embed.FS
+func normalize(vecs [][]float32) {
+ for i, k := range vecs {
+ norm := float64(0)
+ for _, x := range k {
+ norm += float64(x * x)
+ }
+ norm = math.Sqrt(norm)
+ for j, x := range k {
+ vecs[i][j] = x / float32(norm)
+ }
+ }
+}
+
var _ = Describe("Integration tests for the stores backend(s) and internal APIs", Label("stores"), func() {
Context("Embedded Store get,set and delete", func() {
var sl *model.ModelLoader
@@ -192,17 +206,8 @@ var _ = Describe("Integration tests for the stores backend(s) and internal APIs"
// set 3 vectors that are at varying angles to {0.5, 0.5, 0.5}
keys := [][]float32{{0.1, 0.3, 0.5}, {0.5, 0.5, 0.5}, {0.6, 0.6, -0.6}, {0.7, -0.7, -0.7}}
vals := [][]byte{[]byte("test0"), []byte("test1"), []byte("test2"), []byte("test3")}
- // normalize the keys
- for i, k := range keys {
- norm := float64(0)
- for _, x := range k {
- norm += float64(x * x)
- }
- norm = math.Sqrt(norm)
- for j, x := range k {
- keys[i][j] = x / float32(norm)
- }
- }
+
+ normalize(keys)
err := store.SetCols(context.Background(), sc, keys, vals)
Expect(err).ToNot(HaveOccurred())
@@ -225,5 +230,121 @@ var _ = Describe("Integration tests for the stores backend(s) and internal APIs"
Expect(ks[1]).To(Equal(keys[1]))
Expect(vals[1]).To(Equal(vals[1]))
})
+
+ It("It produces the correct cosine similarities for orthogonal and opposite unit vectors", func() {
+ keys := [][]float32{{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}, {-1.0, 0.0, 0.0}}
+ vals := [][]byte{[]byte("x"), []byte("y"), []byte("z"), []byte("-z")}
+
+ err := store.SetCols(context.Background(), sc, keys, vals);
+ Expect(err).ToNot(HaveOccurred())
+
+ _, _, sims, err := store.Find(context.Background(), sc, keys[0], 4)
+ Expect(err).ToNot(HaveOccurred())
+ Expect(sims).To(Equal([]float32{1.0, 0.0, 0.0, -1.0}))
+ })
+
+ It("It produces the correct cosine similarities for orthogonal and opposite vectors", func() {
+ keys := [][]float32{{1.0, 0.0, 1.0}, {0.0, 2.0, 0.0}, {0.0, 0.0, -1.0}, {-1.0, 0.0, -1.0}}
+ vals := [][]byte{[]byte("x"), []byte("y"), []byte("z"), []byte("-z")}
+
+ err := store.SetCols(context.Background(), sc, keys, vals);
+ Expect(err).ToNot(HaveOccurred())
+
+ _, _, sims, err := store.Find(context.Background(), sc, keys[0], 4)
+ Expect(err).ToNot(HaveOccurred())
+ Expect(sims[0]).To(BeNumerically("~", 1, 0.1))
+ Expect(sims[1]).To(BeNumerically("~", 0, 0.1))
+ Expect(sims[2]).To(BeNumerically("~", -0.7, 0.1))
+ Expect(sims[3]).To(BeNumerically("~", -1, 0.1))
+ })
+
+ expectTriangleEq := func(keys [][]float32, vals [][]byte) {
+ sims := map[string]map[string]float32{}
+
+ // compare every key vector pair and store the similarities in a lookup table
+ // that uses the values as keys
+ for i, k := range keys {
+ _, valsk, simsk, err := store.Find(context.Background(), sc, k, 9)
+ Expect(err).ToNot(HaveOccurred())
+
+ for j, v := range valsk {
+ p := string(vals[i])
+ q := string(v)
+
+ if sims[p] == nil {
+ sims[p] = map[string]float32{}
+ }
+
+ //log.Debug().Strs("vals", []string{p, q}).Float32("similarity", simsk[j]).Send()
+
+ sims[p][q] = simsk[j]
+ }
+ }
+
+ // Check that the triangle inequality holds for every combination of the triplet
+ // u, v and w
+ for _, simsu := range sims {
+ for w, simw := range simsu {
+ // acos(u,w) <= ...
+ uws := math.Acos(float64(simw))
+
+ // ... acos(u,v) + acos(v,w)
+ for v, _ := range simsu {
+ uvws := math.Acos(float64(simsu[v])) + math.Acos(float64(sims[v][w]))
+
+ //log.Debug().Str("u", u).Str("v", v).Str("w", w).Send()
+ //log.Debug().Float32("uw", simw).Float32("uv", simsu[v]).Float32("vw", sims[v][w]).Send()
+ Expect(uws).To(BeNumerically("<=", uvws))
+ }
+ }
+ }
+ }
+
+ It("It obeys the triangle inequality for normalized values", func() {
+ keys := [][]float32{
+ {1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0},
+ {-1.0, 0.0, 0.0}, {0.0, -1.0, 0.0}, {0.0, 0.0, -1.0},
+ {2.0, 3.0, 4.0}, {9.0, 7.0, 1.0}, {0.0, -1.2, 2.3},
+ }
+ vals := [][]byte{
+ []byte("x"), []byte("y"), []byte("z"),
+ []byte("-x"), []byte("-y"), []byte("-z"),
+ []byte("u"), []byte("v"), []byte("w"),
+ }
+
+ normalize(keys[6:])
+
+ err := store.SetCols(context.Background(), sc, keys, vals);
+ Expect(err).ToNot(HaveOccurred())
+
+ expectTriangleEq(keys, vals)
+ })
+
+ It("It obeys the triangle inequality", func() {
+ rnd := rand.New(rand.NewSource(151))
+ keys := make([][]float32, 20)
+ vals := make([][]byte, 20)
+
+ for i := range keys {
+ k := make([]float32, 768)
+
+ for j := range k {
+ k[j] = rnd.Float32()
+ }
+
+ keys[i] = k
+ }
+
+ c := byte('a')
+ for i := range vals {
+ vals[i] = []byte{c}
+ c += 1
+ }
+
+ err := store.SetCols(context.Background(), sc, keys, vals);
+ Expect(err).ToNot(HaveOccurred())
+
+ expectTriangleEq(keys, vals)
+ })
})
})
From a05737c7e43224c66eb0b995be54834747d0dd04 Mon Sep 17 00:00:00 2001
From: Peter Cover
Date: Thu, 23 Jan 2025 02:35:53 +0800
Subject: [PATCH 074/679] chore: fix some function names in comment (#4665)
Signed-off-by: petercover
---
core/http/endpoints/localai/backend_monitor.go | 2 +-
pkg/functions/functions.go | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/core/http/endpoints/localai/backend_monitor.go b/core/http/endpoints/localai/backend_monitor.go
index fa11b5c3..a1b93ac3 100644
--- a/core/http/endpoints/localai/backend_monitor.go
+++ b/core/http/endpoints/localai/backend_monitor.go
@@ -28,7 +28,7 @@ func BackendMonitorEndpoint(bm *services.BackendMonitorService) func(c *fiber.Ct
}
}
-// BackendMonitorEndpoint shuts down the specified backend
+// BackendShutdownEndpoint shuts down the specified backend
// @Summary Backend monitor endpoint
// @Param request body schema.BackendMonitorRequest true "Backend statistics request"
// @Router /backend/shutdown [post]
diff --git a/pkg/functions/functions.go b/pkg/functions/functions.go
index 1a7e1ff1..477a43bb 100644
--- a/pkg/functions/functions.go
+++ b/pkg/functions/functions.go
@@ -34,7 +34,7 @@ type Tool struct {
}
type Tools []Tool
-// ToJSONNameStructure converts a list of functions to a JSON structure that can be parsed to a grammar
+// ToJSONStructure converts a list of functions to a JSON structure that can be parsed to a grammar
// This allows the LLM to return a response of the type: { "name": "function_name", "arguments": { "arg1": "value1", "arg2": "value2" } }
func (f Functions) ToJSONStructure(name, args string) JSONFunctionStructure {
nameKey := defaultFunctionNameKey
From 715071b68dce1ed5d9691f55ee8d9e1571cd6fe4 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Wed, 22 Jan 2025 21:51:38 +0100
Subject: [PATCH 075/679] feat(swagger): update swagger (#4667)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
swagger/docs.go | 3 +++
swagger/swagger.json | 3 +++
swagger/swagger.yaml | 2 ++
3 files changed, 8 insertions(+)
diff --git a/swagger/docs.go b/swagger/docs.go
index 13a3d3f3..43bc8822 100644
--- a/swagger/docs.go
+++ b/swagger/docs.go
@@ -1645,6 +1645,9 @@ const docTemplate = `{
"prompt": {
"description": "Prompt is read only by completion/image API calls"
},
+ "quality": {
+ "type": "string"
+ },
"repeat_last_n": {
"type": "integer"
},
diff --git a/swagger/swagger.json b/swagger/swagger.json
index 1c38e9da..7d39e5e9 100644
--- a/swagger/swagger.json
+++ b/swagger/swagger.json
@@ -1638,6 +1638,9 @@
"prompt": {
"description": "Prompt is read only by completion/image API calls"
},
+ "quality": {
+ "type": "string"
+ },
"repeat_last_n": {
"type": "integer"
},
diff --git a/swagger/swagger.yaml b/swagger/swagger.yaml
index 1692f4bb..e747464f 100644
--- a/swagger/swagger.yaml
+++ b/swagger/swagger.yaml
@@ -570,6 +570,8 @@ definitions:
type: number
prompt:
description: Prompt is read only by completion/image API calls
+ quality:
+ type: string
repeat_last_n:
type: integer
repeat_penalty:
From e426ab7c23308ecf618e766345a0985c826423a1 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 23 Jan 2025 08:06:18 +0100
Subject: [PATCH 076/679] feat(faster-whisper): add backend (#4666)
Signed-off-by: Ettore Di Giacinto
---
Dockerfile | 5 +-
Makefile | 13 ++-
backend/python/faster-whisper/Makefile | 20 ++++
backend/python/faster-whisper/backend.py | 94 +++++++++++++++++++
backend/python/faster-whisper/install.sh | 14 +++
backend/python/faster-whisper/protogen.sh | 6 ++
.../faster-whisper/requirements-cpu.txt | 8 ++
.../faster-whisper/requirements-cublas11.txt | 9 ++
.../faster-whisper/requirements-cublas12.txt | 8 ++
.../faster-whisper/requirements-hipblas.txt | 3 +
.../faster-whisper/requirements-intel.txt | 6 ++
.../python/faster-whisper/requirements.txt | 3 +
backend/python/faster-whisper/run.sh | 4 +
backend/python/faster-whisper/test.sh | 6 ++
14 files changed, 196 insertions(+), 3 deletions(-)
create mode 100644 backend/python/faster-whisper/Makefile
create mode 100755 backend/python/faster-whisper/backend.py
create mode 100755 backend/python/faster-whisper/install.sh
create mode 100644 backend/python/faster-whisper/protogen.sh
create mode 100644 backend/python/faster-whisper/requirements-cpu.txt
create mode 100644 backend/python/faster-whisper/requirements-cublas11.txt
create mode 100644 backend/python/faster-whisper/requirements-cublas12.txt
create mode 100644 backend/python/faster-whisper/requirements-hipblas.txt
create mode 100644 backend/python/faster-whisper/requirements-intel.txt
create mode 100644 backend/python/faster-whisper/requirements.txt
create mode 100755 backend/python/faster-whisper/run.sh
create mode 100755 backend/python/faster-whisper/test.sh
diff --git a/Dockerfile b/Dockerfile
index 8594c2a1..b01f071d 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
RUN apt-get update && \
@@ -414,6 +414,9 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAG
if [[ ( "${EXTRA_BACKENDS}" =~ "parler-tts" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/parler-tts \
; fi && \
+ if [[ ( "${EXTRA_BACKENDS}" =~ "faster-whisper" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
+ make -C backend/python/parler-tts \
+ ; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "diffusers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/diffusers \
; fi
diff --git a/Makefile b/Makefile
index 312bfcc4..efc5812b 100644
--- a/Makefile
+++ b/Makefile
@@ -533,10 +533,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
-protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen transformers-protogen parler-tts-protogen kokoro-protogen vllm-protogen openvoice-protogen
+protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen transformers-protogen parler-tts-protogen kokoro-protogen vllm-protogen openvoice-protogen faster-whisper-protogen
.PHONY: protogen-python-clean
-protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean rerankers-protogen-clean transformers-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean
+protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean rerankers-protogen-clean transformers-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean faster-whisper-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -570,6 +570,14 @@ diffusers-protogen:
diffusers-protogen-clean:
$(MAKE) -C backend/python/diffusers protogen-clean
+.PHONY: faster-whisper-protogen
+faster-whisper-protogen:
+ $(MAKE) -C backend/python/faster-whisper protogen
+
+.PHONY: faster-whisper-protogen-clean
+faster-whisper-protogen-clean:
+ $(MAKE) -C backend/python/faster-whisper protogen-clean
+
.PHONY: exllama2-protogen
exllama2-protogen:
$(MAKE) -C backend/python/exllama2 protogen
@@ -641,6 +649,7 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/bark
$(MAKE) -C backend/python/coqui
$(MAKE) -C backend/python/diffusers
+ $(MAKE) -C backend/python/faster-whisper
$(MAKE) -C backend/python/vllm
$(MAKE) -C backend/python/mamba
$(MAKE) -C backend/python/rerankers
diff --git a/backend/python/faster-whisper/Makefile b/backend/python/faster-whisper/Makefile
new file mode 100644
index 00000000..c0e5169f
--- /dev/null
+++ b/backend/python/faster-whisper/Makefile
@@ -0,0 +1,20 @@
+.DEFAULT_GOAL := install
+
+.PHONY: install
+install:
+ bash install.sh
+ $(MAKE) protogen
+
+.PHONY: protogen
+protogen: backend_pb2_grpc.py backend_pb2.py
+
+.PHONY: protogen-clean
+protogen-clean:
+ $(RM) backend_pb2_grpc.py backend_pb2.py
+
+backend_pb2_grpc.py backend_pb2.py:
+ bash protogen.sh
+
+.PHONY: clean
+clean: protogen-clean
+ rm -rf venv __pycache__
\ No newline at end of file
diff --git a/backend/python/faster-whisper/backend.py b/backend/python/faster-whisper/backend.py
new file mode 100755
index 00000000..dbb8b3d9
--- /dev/null
+++ b/backend/python/faster-whisper/backend.py
@@ -0,0 +1,94 @@
+#!/usr/bin/env python3
+"""
+This is an extra gRPC server of LocalAI for Bark TTS
+"""
+from concurrent import futures
+import time
+import argparse
+import signal
+import sys
+import os
+import backend_pb2
+import backend_pb2_grpc
+
+from faster_whisper import WhisperModel
+
+import grpc
+
+
+_ONE_DAY_IN_SECONDS = 60 * 60 * 24
+
+# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
+MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
+COQUI_LANGUAGE = os.environ.get('COQUI_LANGUAGE', None)
+
+# Implement the BackendServicer class with the service methods
+class BackendServicer(backend_pb2_grpc.BackendServicer):
+ """
+ BackendServicer is the class that implements the gRPC service
+ """
+ def Health(self, request, context):
+ return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
+ def LoadModel(self, request, context):
+ device = "cpu"
+ # Get device
+ # device = "cuda" if request.CUDA else "cpu"
+ if request.CUDA:
+ device = "cuda"
+
+ try:
+ print("Preparing models, please wait", file=sys.stderr)
+ self.model = WhisperModel(request.Model, device=device, compute_type="float16")
+ except Exception as err:
+ return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
+ # Implement your logic here for the LoadModel service
+ # Replace this with your desired response
+ return backend_pb2.Result(message="Model loaded successfully", success=True)
+
+ def AudioTranscription(self, request, context):
+ resultSegments = []
+ text = ""
+ try:
+ segments, info = self.model.transcribe(request.dst, beam_size=5, condition_on_previous_text=False)
+ id = 0
+ for segment in segments:
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
+ resultSegments.append(backend_pb2.TranscriptSegment(id=id, start=segment.start, end=segment.end, text=segment.text))
+ text += segment.text
+ id += 1
+ except Exception as err:
+ print(f"Unexpected {err=}, {type(err)=}", file=sys.stderr)
+
+ return backend_pb2.TranscriptResult(segments=resultSegments, text=text)
+
+def serve(address):
+ server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
+ backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
+ server.add_insecure_port(address)
+ server.start()
+ print("Server started. Listening on: " + address, file=sys.stderr)
+
+ # Define the signal handler function
+ def signal_handler(sig, frame):
+ print("Received termination signal. Shutting down...")
+ server.stop(0)
+ sys.exit(0)
+
+ # Set the signal handlers for SIGINT and SIGTERM
+ signal.signal(signal.SIGINT, signal_handler)
+ signal.signal(signal.SIGTERM, signal_handler)
+
+ try:
+ while True:
+ time.sleep(_ONE_DAY_IN_SECONDS)
+ except KeyboardInterrupt:
+ server.stop(0)
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description="Run the gRPC server.")
+ parser.add_argument(
+ "--addr", default="localhost:50051", help="The address to bind the server to."
+ )
+ args = parser.parse_args()
+
+ serve(args.addr)
diff --git a/backend/python/faster-whisper/install.sh b/backend/python/faster-whisper/install.sh
new file mode 100755
index 00000000..36443ef1
--- /dev/null
+++ b/backend/python/faster-whisper/install.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+set -e
+
+source $(dirname $0)/../common/libbackend.sh
+
+# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
+# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
+# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
+# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
+if [ "x${BUILD_PROFILE}" == "xintel" ]; then
+ EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
+fi
+
+installRequirements
diff --git a/backend/python/faster-whisper/protogen.sh b/backend/python/faster-whisper/protogen.sh
new file mode 100644
index 00000000..32f39fbb
--- /dev/null
+++ b/backend/python/faster-whisper/protogen.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+set -e
+
+source $(dirname $0)/../common/libbackend.sh
+
+python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
\ No newline at end of file
diff --git a/backend/python/faster-whisper/requirements-cpu.txt b/backend/python/faster-whisper/requirements-cpu.txt
new file mode 100644
index 00000000..3e03f3ad
--- /dev/null
+++ b/backend/python/faster-whisper/requirements-cpu.txt
@@ -0,0 +1,8 @@
+faster-whisper
+opencv-python
+accelerate
+compel
+peft
+sentencepiece
+torch==2.4.1
+optimum-quanto
\ No newline at end of file
diff --git a/backend/python/faster-whisper/requirements-cublas11.txt b/backend/python/faster-whisper/requirements-cublas11.txt
new file mode 100644
index 00000000..b7453295
--- /dev/null
+++ b/backend/python/faster-whisper/requirements-cublas11.txt
@@ -0,0 +1,9 @@
+--extra-index-url https://download.pytorch.org/whl/cu118
+torch==2.4.1+cu118
+faster-whisper
+opencv-python
+accelerate
+compel
+peft
+sentencepiece
+optimum-quanto
\ No newline at end of file
diff --git a/backend/python/faster-whisper/requirements-cublas12.txt b/backend/python/faster-whisper/requirements-cublas12.txt
new file mode 100644
index 00000000..8f46fa4a
--- /dev/null
+++ b/backend/python/faster-whisper/requirements-cublas12.txt
@@ -0,0 +1,8 @@
+torch==2.4.1
+faster-whisper
+opencv-python
+accelerate
+compel
+peft
+sentencepiece
+optimum-quanto
\ No newline at end of file
diff --git a/backend/python/faster-whisper/requirements-hipblas.txt b/backend/python/faster-whisper/requirements-hipblas.txt
new file mode 100644
index 00000000..29413f05
--- /dev/null
+++ b/backend/python/faster-whisper/requirements-hipblas.txt
@@ -0,0 +1,3 @@
+--extra-index-url https://download.pytorch.org/whl/rocm6.0
+torch
+faster-whisper
\ No newline at end of file
diff --git a/backend/python/faster-whisper/requirements-intel.txt b/backend/python/faster-whisper/requirements-intel.txt
new file mode 100644
index 00000000..417aa0b4
--- /dev/null
+++ b/backend/python/faster-whisper/requirements-intel.txt
@@ -0,0 +1,6 @@
+--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+intel-extension-for-pytorch==2.3.110+xpu
+torch==2.3.1+cxx11.abi
+oneccl_bind_pt==2.3.100+xpu
+optimum[openvino]
+faster-whisper
\ No newline at end of file
diff --git a/backend/python/faster-whisper/requirements.txt b/backend/python/faster-whisper/requirements.txt
new file mode 100644
index 00000000..0f43df10
--- /dev/null
+++ b/backend/python/faster-whisper/requirements.txt
@@ -0,0 +1,3 @@
+grpcio==1.69.0
+protobuf
+grpcio-tools
\ No newline at end of file
diff --git a/backend/python/faster-whisper/run.sh b/backend/python/faster-whisper/run.sh
new file mode 100755
index 00000000..375c07e5
--- /dev/null
+++ b/backend/python/faster-whisper/run.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+source $(dirname $0)/../common/libbackend.sh
+
+startBackend $@
\ No newline at end of file
diff --git a/backend/python/faster-whisper/test.sh b/backend/python/faster-whisper/test.sh
new file mode 100755
index 00000000..6940b066
--- /dev/null
+++ b/backend/python/faster-whisper/test.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+set -e
+
+source $(dirname $0)/../common/libbackend.sh
+
+runUnittests
From 200fe358f0c2f25a61b0b64478f10be945021f75 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Thu, 23 Jan 2025 08:06:43 +0100
Subject: [PATCH 077/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`6152129d05870cb38162c422c6ba80434e021e9f` (#4668)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index efc5812b..467b2d39 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=6171c9d25820ccf676b243c172868819d882848f
+CPPLLAMA_VERSION?=6152129d05870cb38162c422c6ba80434e021e9f
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 89429a439b3a5c5571f8bfe9be228a56f94f7a84 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 23 Jan 2025 09:30:47 +0100
Subject: [PATCH 078/679] feat(transformers): add support to Mamba (#4669)
Signed-off-by: Ettore Di Giacinto
---
Dockerfile | 6 +-
Makefile | 13 +-
backend/python/mamba/Makefile | 29 ---
backend/python/mamba/README.md | 5 -
backend/python/mamba/backend.py | 179 ------------------
backend/python/mamba/install.sh | 9 -
backend/python/mamba/requirements-after.txt | 2 -
backend/python/mamba/requirements-cpu.txt | 2 -
.../python/mamba/requirements-cublas11.txt | 3 -
.../python/mamba/requirements-cublas12.txt | 2 -
backend/python/mamba/requirements-install.txt | 6 -
backend/python/mamba/requirements.txt | 3 -
backend/python/mamba/run.sh | 6 -
backend/python/mamba/test.py | 76 --------
backend/python/mamba/test.sh | 6 -
backend/python/transformers/backend.py | 6 +-
pkg/model/initializers.go | 2 +
17 files changed, 10 insertions(+), 345 deletions(-)
delete mode 100644 backend/python/mamba/Makefile
delete mode 100644 backend/python/mamba/README.md
delete mode 100644 backend/python/mamba/backend.py
delete mode 100755 backend/python/mamba/install.sh
delete mode 100644 backend/python/mamba/requirements-after.txt
delete mode 100644 backend/python/mamba/requirements-cpu.txt
delete mode 100644 backend/python/mamba/requirements-cublas11.txt
delete mode 100644 backend/python/mamba/requirements-cublas12.txt
delete mode 100644 backend/python/mamba/requirements-install.txt
delete mode 100644 backend/python/mamba/requirements.txt
delete mode 100755 backend/python/mamba/run.sh
delete mode 100644 backend/python/mamba/test.py
delete mode 100755 backend/python/mamba/test.sh
diff --git a/Dockerfile b/Dockerfile
index b01f071d..9f699ac9 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -15,8 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
-
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
@@ -445,9 +444,6 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vllm" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "rerankers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/rerankers \
- ; fi && \
- if [[ ( "${EXTRA_BACKENDS}" =~ "mamba" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
- make -C backend/python/mamba \
; fi
# Make sure the models directory exists
diff --git a/Makefile b/Makefile
index 467b2d39..fc649c4f 100644
--- a/Makefile
+++ b/Makefile
@@ -533,10 +533,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
-protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen mamba-protogen rerankers-protogen transformers-protogen parler-tts-protogen kokoro-protogen vllm-protogen openvoice-protogen faster-whisper-protogen
+protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen parler-tts-protogen kokoro-protogen vllm-protogen openvoice-protogen faster-whisper-protogen
.PHONY: protogen-python-clean
-protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean mamba-protogen-clean rerankers-protogen-clean transformers-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean faster-whisper-protogen-clean
+protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean faster-whisper-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -586,14 +586,6 @@ exllama2-protogen:
exllama2-protogen-clean:
$(MAKE) -C backend/python/exllama2 protogen-clean
-.PHONY: mamba-protogen
-mamba-protogen:
- $(MAKE) -C backend/python/mamba protogen
-
-.PHONY: mamba-protogen-clean
-mamba-protogen-clean:
- $(MAKE) -C backend/python/mamba protogen-clean
-
.PHONY: rerankers-protogen
rerankers-protogen:
$(MAKE) -C backend/python/rerankers protogen
@@ -651,7 +643,6 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/diffusers
$(MAKE) -C backend/python/faster-whisper
$(MAKE) -C backend/python/vllm
- $(MAKE) -C backend/python/mamba
$(MAKE) -C backend/python/rerankers
$(MAKE) -C backend/python/transformers
$(MAKE) -C backend/python/parler-tts
diff --git a/backend/python/mamba/Makefile b/backend/python/mamba/Makefile
deleted file mode 100644
index 52b1c53a..00000000
--- a/backend/python/mamba/Makefile
+++ /dev/null
@@ -1,29 +0,0 @@
-.PHONY: mamba
-mamba: protogen
- bash install.sh
-
-.PHONY: run
-run: protogen
- @echo "Running mamba..."
- bash run.sh
- @echo "mamba run."
-
-.PHONY: test
-test: protogen
- @echo "Testing mamba..."
- bash test.sh
- @echo "mamba tested."
-
-.PHONY: protogen
-protogen: backend_pb2_grpc.py backend_pb2.py
-
-.PHONY: protogen-clean
-protogen-clean:
- $(RM) backend_pb2_grpc.py backend_pb2.py
-
-backend_pb2_grpc.py backend_pb2.py:
- python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
-
-.PHONY: clean
-clean: protogen-clean
- $(RM) -r venv __pycache__
\ No newline at end of file
diff --git a/backend/python/mamba/README.md b/backend/python/mamba/README.md
deleted file mode 100644
index d6ead917..00000000
--- a/backend/python/mamba/README.md
+++ /dev/null
@@ -1,5 +0,0 @@
-# Creating a separate environment for the mamba project
-
-```
-make mamba
-```
\ No newline at end of file
diff --git a/backend/python/mamba/backend.py b/backend/python/mamba/backend.py
deleted file mode 100644
index 3c15fea7..00000000
--- a/backend/python/mamba/backend.py
+++ /dev/null
@@ -1,179 +0,0 @@
-#!/usr/bin/env python3
-from concurrent import futures
-import time
-import argparse
-import signal
-import sys
-import os
-
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-import torch
-from transformers import AutoTokenizer, AutoModelForCausalLM
-from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
-
-_ONE_DAY_IN_SECONDS = 60 * 60 * 24
-
-# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
-MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
-MAMBA_CHAT= os.environ.get('MAMBA_CHAT', '1') == '1'
-
-# Implement the BackendServicer class with the service methods
-class BackendServicer(backend_pb2_grpc.BackendServicer):
- """
- A gRPC servicer that implements the Backend service defined in backend.proto.
- """
- def generate(self,prompt, max_new_tokens):
- """
- Generates text based on the given prompt and maximum number of new tokens.
-
- Args:
- prompt (str): The prompt to generate text from.
- max_new_tokens (int): The maximum number of new tokens to generate.
-
- Returns:
- str: The generated text.
- """
- self.generator.end_beam_search()
-
- # Tokenizing the input
- ids = self.generator.tokenizer.encode(prompt)
-
- self.generator.gen_begin_reuse(ids)
- initial_len = self.generator.sequence[0].shape[0]
- has_leading_space = False
- decoded_text = ''
- for i in range(max_new_tokens):
- token = self.generator.gen_single_token()
- if i == 0 and self.generator.tokenizer.tokenizer.IdToPiece(int(token)).startswith('ā'):
- has_leading_space = True
-
- decoded_text = self.generator.tokenizer.decode(self.generator.sequence[0][initial_len:])
- if has_leading_space:
- decoded_text = ' ' + decoded_text
-
- if token.item() == self.generator.tokenizer.eos_token_id:
- break
- return decoded_text
-
- def Health(self, request, context):
- """
- Returns a health check message.
-
- Args:
- request: The health check request.
- context: The gRPC context.
-
- Returns:
- backend_pb2.Reply: The health check reply.
- """
- return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
-
- def LoadModel(self, request, context):
- """
- Loads a language model.
-
- Args:
- request: The load model request.
- context: The gRPC context.
-
- Returns:
- backend_pb2.Result: The load model result.
- """
- try:
- tokenizerModel = request.Tokenizer
- if tokenizerModel == "":
- tokenizerModel = request.Model
-
- tokenizer = AutoTokenizer.from_pretrained(tokenizerModel)
- if MAMBA_CHAT:
- tokenizer.eos_token = "<|endoftext|>"
- tokenizer.pad_token = tokenizer.eos_token
- self.tokenizer = tokenizer
- self.model = MambaLMHeadModel.from_pretrained(request.Model, device="cuda", dtype=torch.float16)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
- return backend_pb2.Result(message="Model loaded successfully", success=True)
-
- def Predict(self, request, context):
- """
- Generates text based on the given prompt and sampling parameters.
-
- Args:
- request: The predict request.
- context: The gRPC context.
-
- Returns:
- backend_pb2.Result: The predict result.
- """
- if request.TopP == 0:
- request.TopP = 0.9
-
- max_tokens = request.Tokens
-
- if request.Tokens == 0:
- max_tokens = 2000
-
- # encoded_input = self.tokenizer(request.Prompt)
- tokens = self.tokenizer(request.Prompt, return_tensors="pt")
- input_ids = tokens.input_ids.to(device="cuda")
- out = self.model.generate(input_ids=input_ids, max_length=max_tokens, temperature=request.Temperature,
- top_p=request.TopP, eos_token_id=self.tokenizer.eos_token_id)
-
- decoded = self.tokenizer.batch_decode(out)
-
- generated_text = decoded[0]
-
- # Remove prompt from response if present
- if request.Prompt in generated_text:
- generated_text = generated_text.replace(request.Prompt, "")
-
- return backend_pb2.Reply(message=bytes(generated_text, encoding='utf-8'))
-
- def PredictStream(self, request, context):
- """
- Generates text based on the given prompt and sampling parameters, and streams the results.
-
- Args:
- request: The predict stream request.
- context: The gRPC context.
-
- Returns:
- backend_pb2.Result: The predict stream result.
- """
- yield self.Predict(request, context)
-
-def serve(address):
- server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
- backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
- server.add_insecure_port(address)
- server.start()
- print("Server started. Listening on: " + address, file=sys.stderr)
-
- # Define the signal handler function
- def signal_handler(sig, frame):
- print("Received termination signal. Shutting down...")
- server.stop(0)
- sys.exit(0)
-
- # Set the signal handlers for SIGINT and SIGTERM
- signal.signal(signal.SIGINT, signal_handler)
- signal.signal(signal.SIGTERM, signal_handler)
-
- try:
- while True:
- time.sleep(_ONE_DAY_IN_SECONDS)
- except KeyboardInterrupt:
- server.stop(0)
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser(description="Run the gRPC server.")
- parser.add_argument(
- "--addr", default="localhost:50051", help="The address to bind the server to."
- )
- args = parser.parse_args()
-
- serve(args.addr)
diff --git a/backend/python/mamba/install.sh b/backend/python/mamba/install.sh
deleted file mode 100755
index db18eefc..00000000
--- a/backend/python/mamba/install.sh
+++ /dev/null
@@ -1,9 +0,0 @@
-#!/bin/bash
-set -e
-
-LIMIT_TARGETS="cublas"
-EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
-
-source $(dirname $0)/../common/libbackend.sh
-
-installRequirements
\ No newline at end of file
diff --git a/backend/python/mamba/requirements-after.txt b/backend/python/mamba/requirements-after.txt
deleted file mode 100644
index ea6890eb..00000000
--- a/backend/python/mamba/requirements-after.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-causal-conv1d==1.4.0
-mamba-ssm==2.2.2
\ No newline at end of file
diff --git a/backend/python/mamba/requirements-cpu.txt b/backend/python/mamba/requirements-cpu.txt
deleted file mode 100644
index b4f1261f..00000000
--- a/backend/python/mamba/requirements-cpu.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-torch==2.4.1
-transformers
\ No newline at end of file
diff --git a/backend/python/mamba/requirements-cublas11.txt b/backend/python/mamba/requirements-cublas11.txt
deleted file mode 100644
index ed0d4df5..00000000
--- a/backend/python/mamba/requirements-cublas11.txt
+++ /dev/null
@@ -1,3 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/cu118
-torch==2.4.1+cu118
-transformers
\ No newline at end of file
diff --git a/backend/python/mamba/requirements-cublas12.txt b/backend/python/mamba/requirements-cublas12.txt
deleted file mode 100644
index b4f1261f..00000000
--- a/backend/python/mamba/requirements-cublas12.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-torch==2.4.1
-transformers
\ No newline at end of file
diff --git a/backend/python/mamba/requirements-install.txt b/backend/python/mamba/requirements-install.txt
deleted file mode 100644
index 69d263f0..00000000
--- a/backend/python/mamba/requirements-install.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-# mabma does not specify it's build dependencies per PEP517, so we need to disable build isolation
-# this also means that we need to install the basic build dependencies into the venv ourselves
-# https://github.com/Dao-AILab/causal-conv1d/issues/24
-packaging
-setuptools
-wheel
\ No newline at end of file
diff --git a/backend/python/mamba/requirements.txt b/backend/python/mamba/requirements.txt
deleted file mode 100644
index afc8b2a9..00000000
--- a/backend/python/mamba/requirements.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-grpcio==1.69.0
-protobuf
-certifi
\ No newline at end of file
diff --git a/backend/python/mamba/run.sh b/backend/python/mamba/run.sh
deleted file mode 100755
index 1afc3984..00000000
--- a/backend/python/mamba/run.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/bin/bash
-LIMIT_TARGETS="cublas"
-
-source $(dirname $0)/../common/libbackend.sh
-
-startBackend $@
\ No newline at end of file
diff --git a/backend/python/mamba/test.py b/backend/python/mamba/test.py
deleted file mode 100644
index 83fb2651..00000000
--- a/backend/python/mamba/test.py
+++ /dev/null
@@ -1,76 +0,0 @@
-import unittest
-import subprocess
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-import unittest
-import subprocess
-import time
-import grpc
-import backend_pb2_grpc
-import backend_pb2
-
-class TestBackendServicer(unittest.TestCase):
- """
- TestBackendServicer is the class that tests the gRPC service.
-
- This class contains methods to test the startup and shutdown of the gRPC service.
- """
- def setUp(self):
- self.service = subprocess.Popen(["python", "backend.py", "--addr", "localhost:50051"])
- time.sleep(10)
-
- def tearDown(self) -> None:
- self.service.terminate()
- self.service.wait()
-
- def test_server_startup(self):
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.Health(backend_pb2.HealthMessage())
- self.assertEqual(response.message, b'OK')
- except Exception as err:
- print(err)
- self.fail("Server failed to start")
- finally:
- self.tearDown()
- def test_load_model(self):
- """
- This method tests if the model is loaded successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/opt-125m"))
- self.assertTrue(response.success)
- self.assertEqual(response.message, "Model loaded successfully")
- except Exception as err:
- print(err)
- self.fail("LoadModel service failed")
- finally:
- self.tearDown()
-
- def test_text(self):
- """
- This method tests if the embeddings are generated successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="facebook/opt-125m"))
- self.assertTrue(response.success)
- req = backend_pb2.PredictOptions(Prompt="The capital of France is")
- resp = stub.Predict(req)
- self.assertIsNotNone(resp.message)
- except Exception as err:
- print(err)
- self.fail("text service failed")
- finally:
- self.tearDown()
\ No newline at end of file
diff --git a/backend/python/mamba/test.sh b/backend/python/mamba/test.sh
deleted file mode 100755
index 6940b066..00000000
--- a/backend/python/mamba/test.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-runUnittests
diff --git a/backend/python/transformers/backend.py b/backend/python/transformers/backend.py
index 9b65c6db..b0d5875b 100644
--- a/backend/python/transformers/backend.py
+++ b/backend/python/transformers/backend.py
@@ -21,7 +21,7 @@ import torch.cuda
XPU=os.environ.get("XPU", "0") == "1"
-from transformers import AutoTokenizer, AutoModel, set_seed, TextIteratorStreamer, StoppingCriteriaList, StopStringCriteria
+from transformers import AutoTokenizer, AutoModel, set_seed, TextIteratorStreamer, StoppingCriteriaList, StopStringCriteria, MambaConfig, MambaForCausalLM
from transformers import AutoProcessor, MusicgenForConditionalGeneration
from scipy.io import wavfile
import outetts
@@ -245,6 +245,10 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
autoTokenizer = False
self.model = SentenceTransformer(model_name, trust_remote_code=request.TrustRemoteCode)
self.SentenceTransformer = True
+ elif request.Type == "Mamba":
+ autoTokenizer = False
+ self.tokenizer = AutoTokenizer.from_pretrained(model_name)
+ self.model = MambaForCausalLM.from_pretrained(model_name)
else:
print("Automodel", file=sys.stderr)
self.model = AutoModel.from_pretrained(model_name,
diff --git a/pkg/model/initializers.go b/pkg/model/initializers.go
index b2a5293b..d5f1459b 100644
--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@@ -29,12 +29,14 @@ var Aliases map[string]string = map[string]string{
"langchain-huggingface": LCHuggingFaceBackend,
"transformers-musicgen": TransformersBackend,
"sentencetransformers": TransformersBackend,
+ "mamba": TransformersBackend,
"stablediffusion": StableDiffusionGGMLBackend,
}
var TypeAlias map[string]string = map[string]string{
"sentencetransformers": "SentenceTransformer",
"huggingface-embeddings": "SentenceTransformer",
+ "mamba": "Mamba",
"transformers-musicgen": "MusicgenForConditionalGeneration",
}
From 318225f631189c6d8952eac5125d220ca76246f5 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 23 Jan 2025 09:46:16 +0100
Subject: [PATCH 079/679] chore(parler-tts): drop backend (#4672)
We support at this point more extensive backends that are SOTA and
support also voice cloning, and many other features. This backend is
superseded and also poses significant maintenance burden as there is an
open issue https://github.com/mudler/LocalAI/issues/3941 which is still
open as it deps are pinning old versions of grpc.
Closes https://github.com/mudler/LocalAI/issues/3941
Signed-off-by: Ettore Di Giacinto
---
Dockerfile | 7 +-
Makefile | 13 +-
backend/python/parler-tts/Makefile | 44 ------
backend/python/parler-tts/backend.py | 125 ------------------
backend/python/parler-tts/install.sh | 28 ----
backend/python/parler-tts/protogen.sh | 6 -
.../python/parler-tts/requirements-after.txt | 4 -
.../python/parler-tts/requirements-cpu.txt | 3 -
.../parler-tts/requirements-cublas11.txt | 5 -
.../parler-tts/requirements-cublas12.txt | 4 -
.../parler-tts/requirements-hipblas.txt | 5 -
.../python/parler-tts/requirements-intel.txt | 8 --
backend/python/parler-tts/requirements.txt | 4 -
backend/python/parler-tts/run.sh | 4 -
backend/python/parler-tts/test.py | 81 ------------
backend/python/parler-tts/test.sh | 6 -
16 files changed, 4 insertions(+), 343 deletions(-)
delete mode 100644 backend/python/parler-tts/Makefile
delete mode 100644 backend/python/parler-tts/backend.py
delete mode 100755 backend/python/parler-tts/install.sh
delete mode 100755 backend/python/parler-tts/protogen.sh
delete mode 100644 backend/python/parler-tts/requirements-after.txt
delete mode 100644 backend/python/parler-tts/requirements-cpu.txt
delete mode 100644 backend/python/parler-tts/requirements-cublas11.txt
delete mode 100644 backend/python/parler-tts/requirements-cublas12.txt
delete mode 100644 backend/python/parler-tts/requirements-hipblas.txt
delete mode 100644 backend/python/parler-tts/requirements-intel.txt
delete mode 100644 backend/python/parler-tts/requirements.txt
delete mode 100755 backend/python/parler-tts/run.sh
delete mode 100644 backend/python/parler-tts/test.py
delete mode 100755 backend/python/parler-tts/test.sh
diff --git a/Dockerfile b/Dockerfile
index 9f699ac9..625d2869 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
@@ -410,11 +410,8 @@ RUN if [[ ( "${IMAGE_TYPE}" == "extras ")]]; then \
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/coqui \
; fi && \
- if [[ ( "${EXTRA_BACKENDS}" =~ "parler-tts" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
- make -C backend/python/parler-tts \
- ; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "faster-whisper" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
- make -C backend/python/parler-tts \
+ make -C backend/python/faster-whisper \
; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "diffusers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/diffusers \
diff --git a/Makefile b/Makefile
index fc649c4f..04e280d8 100644
--- a/Makefile
+++ b/Makefile
@@ -533,10 +533,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
-protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen parler-tts-protogen kokoro-protogen vllm-protogen openvoice-protogen faster-whisper-protogen
+protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen openvoice-protogen faster-whisper-protogen
.PHONY: protogen-python-clean
-protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean parler-tts-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean faster-whisper-protogen-clean
+protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean faster-whisper-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -602,14 +602,6 @@ transformers-protogen:
transformers-protogen-clean:
$(MAKE) -C backend/python/transformers protogen-clean
-.PHONY: parler-tts-protogen
-parler-tts-protogen:
- $(MAKE) -C backend/python/parler-tts protogen
-
-.PHONY: parler-tts-protogen-clean
-parler-tts-protogen-clean:
- $(MAKE) -C backend/python/parler-tts protogen-clean
-
.PHONY: kokoro-protogen
kokoro-protogen:
$(MAKE) -C backend/python/kokoro protogen
@@ -645,7 +637,6 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/vllm
$(MAKE) -C backend/python/rerankers
$(MAKE) -C backend/python/transformers
- $(MAKE) -C backend/python/parler-tts
$(MAKE) -C backend/python/kokoro
$(MAKE) -C backend/python/openvoice
$(MAKE) -C backend/python/exllama2
diff --git a/backend/python/parler-tts/Makefile b/backend/python/parler-tts/Makefile
deleted file mode 100644
index 48da2f3f..00000000
--- a/backend/python/parler-tts/Makefile
+++ /dev/null
@@ -1,44 +0,0 @@
-export CONDA_ENV_PATH = "parler.yml"
-SKIP_CONDA?=0
-ifeq ($(BUILD_TYPE), cublas)
-export CONDA_ENV_PATH = "parler-nvidia.yml"
-endif
-
-# Intel GPU are supposed to have dependencies installed in the main python
-# environment, so we skip conda installation for SYCL builds.
-# https://github.com/intel/intel-extension-for-pytorch/issues/538
-ifneq (,$(findstring sycl,$(BUILD_TYPE)))
-export SKIP_CONDA=1
-endif
-
-.PHONY: parler-tts
-parler-tts:
- @echo "Installing $(CONDA_ENV_PATH)..."
- bash install.sh $(CONDA_ENV_PATH)
- $(MAKE) protogen
-
-.PHONY: run
-run: protogen
- @echo "Running transformers..."
- bash run.sh
- @echo "transformers run."
-
-.PHONY: test
-test: protogen
- @echo "Testing transformers..."
- bash test.sh
- @echo "transformers tested."
-
-.PHONY: protogen
-protogen: backend_pb2_grpc.py backend_pb2.py
-
-.PHONY: protogen-clean
-protogen-clean:
- $(RM) backend_pb2_grpc.py backend_pb2.py
-
-backend_pb2_grpc.py backend_pb2.py:
- bash protogen.sh
-
-.PHONY: clean
-clean: protogen-clean
- $(RM) -r venv __pycache__
\ No newline at end of file
diff --git a/backend/python/parler-tts/backend.py b/backend/python/parler-tts/backend.py
deleted file mode 100644
index 655990d7..00000000
--- a/backend/python/parler-tts/backend.py
+++ /dev/null
@@ -1,125 +0,0 @@
-#!/usr/bin/env python3
-"""
-Extra gRPC server for MusicgenForConditionalGeneration models.
-"""
-from concurrent import futures
-
-import argparse
-import signal
-import sys
-import os
-
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-from scipy.io.wavfile import write as write_wav
-
-from parler_tts import ParlerTTSForConditionalGeneration
-from transformers import AutoTokenizer
-import soundfile as sf
-import torch
-
-_ONE_DAY_IN_SECONDS = 60 * 60 * 24
-
-# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
-MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
-
-# Implement the BackendServicer class with the service methods
-class BackendServicer(backend_pb2_grpc.BackendServicer):
- """
- A gRPC servicer for the backend service.
-
- This class implements the gRPC methods for the backend service, including Health, LoadModel, and Embedding.
- """
- def Health(self, request, context):
- """
- A gRPC method that returns the health status of the backend service.
-
- Args:
- request: A HealthRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- A Reply object that contains the health status of the backend service.
- """
- return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
-
- def LoadModel(self, request, context):
- """
- A gRPC method that loads a model into memory.
-
- Args:
- request: A LoadModelRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- A Result object that contains the result of the LoadModel operation.
- """
- model_name = request.Model
- device = "cuda:0" if torch.cuda.is_available() else "cpu"
- try:
- self.model = ParlerTTSForConditionalGeneration.from_pretrained(model_name).to(device)
- self.tokenizer = AutoTokenizer.from_pretrained(model_name)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
-
- return backend_pb2.Result(message="Model loaded successfully", success=True)
-
- def TTS(self, request, context):
- model_name = request.model
- voice = request.voice
- if voice == "":
- voice = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."
- if model_name == "":
- return backend_pb2.Result(success=False, message="request.model is required")
- try:
- device = "cuda:0" if torch.cuda.is_available() else "cpu"
- input_ids = self.tokenizer(voice, return_tensors="pt").input_ids.to(device)
- prompt_input_ids = self.tokenizer(request.text, return_tensors="pt").input_ids.to(device)
-
- generation = self.model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
- audio_arr = generation.cpu().numpy().squeeze()
- print("[parler-tts] TTS generated!", file=sys.stderr)
- sf.write(request.dst, audio_arr, self.model.config.sampling_rate)
- print("[parler-tts] TTS saved to", request.dst, file=sys.stderr)
- print("[parler-tts] TTS for", file=sys.stderr)
- print(request, file=sys.stderr)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
- return backend_pb2.Result(success=True)
-
-
-def serve(address):
- server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
- backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
- server.add_insecure_port(address)
- server.start()
- print("[parler-tts] Server started. Listening on: " + address, file=sys.stderr)
-
- # Define the signal handler function
- def signal_handler(sig, frame):
- print("[parler-tts] Received termination signal. Shutting down...")
- server.stop(0)
- sys.exit(0)
-
- # Set the signal handlers for SIGINT and SIGTERM
- signal.signal(signal.SIGINT, signal_handler)
- signal.signal(signal.SIGTERM, signal_handler)
-
- try:
- while True:
- time.sleep(_ONE_DAY_IN_SECONDS)
- except KeyboardInterrupt:
- server.stop(0)
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser(description="Run the gRPC server.")
- parser.add_argument(
- "--addr", default="localhost:50051", help="The address to bind the server to."
- )
- args = parser.parse_args()
- print(f"[parler-tts] startup: {args}", file=sys.stderr)
- serve(args.addr)
diff --git a/backend/python/parler-tts/install.sh b/backend/python/parler-tts/install.sh
deleted file mode 100755
index 14df9b14..00000000
--- a/backend/python/parler-tts/install.sh
+++ /dev/null
@@ -1,28 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
-# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
-# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
-# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
-if [ "x${BUILD_PROFILE}" == "xintel" ]; then
- EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
-fi
-
-
-installRequirements
-
-
-# https://github.com/descriptinc/audiotools/issues/101
-# incompatible protobuf versions.
-PYDIR=python3.10
-pyenv="${MY_DIR}/venv/lib/${PYDIR}/site-packages/google/protobuf/internal/"
-
-if [ ! -d ${pyenv} ]; then
- echo "(parler-tts/install.sh): Error: ${pyenv} does not exist"
- exit 1
-fi
-
-curl -L https://raw.githubusercontent.com/protocolbuffers/protobuf/main/python/google/protobuf/internal/builder.py -o ${pyenv}/builder.py
diff --git a/backend/python/parler-tts/protogen.sh b/backend/python/parler-tts/protogen.sh
deleted file mode 100755
index 32f39fbb..00000000
--- a/backend/python/parler-tts/protogen.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
\ No newline at end of file
diff --git a/backend/python/parler-tts/requirements-after.txt b/backend/python/parler-tts/requirements-after.txt
deleted file mode 100644
index 702074de..00000000
--- a/backend/python/parler-tts/requirements-after.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-git+https://github.com/huggingface/parler-tts.git@8e465f1b5fcd223478e07175cb40494d19ffbe17
-llvmlite==0.43.0
-numba==0.60.0
-grpcio-tools==1.42.0
\ No newline at end of file
diff --git a/backend/python/parler-tts/requirements-cpu.txt b/backend/python/parler-tts/requirements-cpu.txt
deleted file mode 100644
index 2021fc20..00000000
--- a/backend/python/parler-tts/requirements-cpu.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-transformers
-accelerate
-torch==2.4.1
\ No newline at end of file
diff --git a/backend/python/parler-tts/requirements-cublas11.txt b/backend/python/parler-tts/requirements-cublas11.txt
deleted file mode 100644
index 9f8fe9ff..00000000
--- a/backend/python/parler-tts/requirements-cublas11.txt
+++ /dev/null
@@ -1,5 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/cu118
-torch==2.4.1+cu118
-torchaudio==2.4.1+cu118
-transformers
-accelerate
\ No newline at end of file
diff --git a/backend/python/parler-tts/requirements-cublas12.txt b/backend/python/parler-tts/requirements-cublas12.txt
deleted file mode 100644
index 53716949..00000000
--- a/backend/python/parler-tts/requirements-cublas12.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-torch==2.4.1
-torchaudio==2.4.1
-transformers
-accelerate
\ No newline at end of file
diff --git a/backend/python/parler-tts/requirements-hipblas.txt b/backend/python/parler-tts/requirements-hipblas.txt
deleted file mode 100644
index b8758537..00000000
--- a/backend/python/parler-tts/requirements-hipblas.txt
+++ /dev/null
@@ -1,5 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/rocm6.0
-torch==2.3.0+rocm6.0
-torchaudio==2.3.0+rocm6.0
-transformers
-accelerate
diff --git a/backend/python/parler-tts/requirements-intel.txt b/backend/python/parler-tts/requirements-intel.txt
deleted file mode 100644
index f6814bd9..00000000
--- a/backend/python/parler-tts/requirements-intel.txt
+++ /dev/null
@@ -1,8 +0,0 @@
---extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-intel-extension-for-pytorch==2.3.110+xpu
-torch==2.3.1+cxx11.abi
-torchaudio==2.3.1+cxx11.abi
-oneccl_bind_pt==2.3.100+xpu
-optimum[openvino]
-transformers
-accelerate
\ No newline at end of file
diff --git a/backend/python/parler-tts/requirements.txt b/backend/python/parler-tts/requirements.txt
deleted file mode 100644
index e6ba016b..00000000
--- a/backend/python/parler-tts/requirements.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-grpcio==1.69.0
-certifi
-llvmlite==0.43.0
-setuptools
\ No newline at end of file
diff --git a/backend/python/parler-tts/run.sh b/backend/python/parler-tts/run.sh
deleted file mode 100755
index 375c07e5..00000000
--- a/backend/python/parler-tts/run.sh
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/bin/bash
-source $(dirname $0)/../common/libbackend.sh
-
-startBackend $@
\ No newline at end of file
diff --git a/backend/python/parler-tts/test.py b/backend/python/parler-tts/test.py
deleted file mode 100644
index 639d43a9..00000000
--- a/backend/python/parler-tts/test.py
+++ /dev/null
@@ -1,81 +0,0 @@
-"""
-A test script to test the gRPC service
-"""
-import unittest
-import subprocess
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-
-class TestBackendServicer(unittest.TestCase):
- """
- TestBackendServicer is the class that tests the gRPC service
- """
- def setUp(self):
- """
- This method sets up the gRPC service by starting the server
- """
- self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
- time.sleep(10)
-
- def tearDown(self) -> None:
- """
- This method tears down the gRPC service by terminating the server
- """
- self.service.terminate()
- self.service.wait()
-
- def test_server_startup(self):
- """
- This method tests if the server starts up successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.Health(backend_pb2.HealthMessage())
- self.assertEqual(response.message, b'OK')
- except Exception as err:
- print(err)
- self.fail("Server failed to start")
- finally:
- self.tearDown()
-
- def test_load_model(self):
- """
- This method tests if the model is loaded successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="parler-tts/parler_tts_mini_v0.1"))
- self.assertTrue(response.success)
- self.assertEqual(response.message, "Model loaded successfully")
- except Exception as err:
- print(err)
- self.fail("LoadModel service failed")
- finally:
- self.tearDown()
-
- def test_tts(self):
- """
- This method tests if the embeddings are generated successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="parler-tts/parler_tts_mini_v0.1"))
- self.assertTrue(response.success)
- tts_request = backend_pb2.TTSRequest(text="Hey, how are you doing today?")
- tts_response = stub.TTS(tts_request)
- self.assertIsNotNone(tts_response)
- except Exception as err:
- print(err)
- self.fail("TTS service failed")
- finally:
- self.tearDown()
\ No newline at end of file
diff --git a/backend/python/parler-tts/test.sh b/backend/python/parler-tts/test.sh
deleted file mode 100755
index 6940b066..00000000
--- a/backend/python/parler-tts/test.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-runUnittests
From 073eaec7295fe1fc5c9f2297fc6de6c0a85c36a1 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 23 Jan 2025 10:00:36 +0100
Subject: [PATCH 080/679] chore(openvoice): drop backend (#4673)
The project (MeloTTS) has been quite since long, newer backends are much
performant and better quality overall.
Signed-off-by: Ettore Di Giacinto
---
Dockerfile | 5 +-
Makefile | 13 +-
backend/python/openvoice/Makefile | 25 ---
backend/python/openvoice/backend.py | 158 ------------------
backend/python/openvoice/install.sh | 16 --
backend/python/openvoice/requirements-cpu.txt | 7 -
.../openvoice/requirements-cublas11.txt | 8 -
.../openvoice/requirements-cublas12.txt | 7 -
.../python/openvoice/requirements-hipblas.txt | 8 -
.../python/openvoice/requirements-intel.txt | 24 ---
backend/python/openvoice/requirements.txt | 17 --
backend/python/openvoice/run.sh | 4 -
backend/python/openvoice/test.py | 82 ---------
backend/python/openvoice/test.sh | 12 --
14 files changed, 3 insertions(+), 383 deletions(-)
delete mode 100644 backend/python/openvoice/Makefile
delete mode 100755 backend/python/openvoice/backend.py
delete mode 100755 backend/python/openvoice/install.sh
delete mode 100644 backend/python/openvoice/requirements-cpu.txt
delete mode 100644 backend/python/openvoice/requirements-cublas11.txt
delete mode 100644 backend/python/openvoice/requirements-cublas12.txt
delete mode 100644 backend/python/openvoice/requirements-hipblas.txt
delete mode 100644 backend/python/openvoice/requirements-intel.txt
delete mode 100644 backend/python/openvoice/requirements.txt
delete mode 100755 backend/python/openvoice/run.sh
delete mode 100644 backend/python/openvoice/test.py
delete mode 100755 backend/python/openvoice/test.sh
diff --git a/Dockerfile b/Dockerfile
index 625d2869..566e03bc 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -15,7 +15,7 @@ ARG TARGETARCH
ARG TARGETVARIANT
ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,openvoice:/build/backend/python/openvoice/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh"
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
@@ -420,9 +420,6 @@ RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAG
RUN if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/kokoro \
; fi && \
- if [[ ( "${EXTRA_BACKENDS}" =~ "openvoice" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
- make -C backend/python/openvoice \
- ; fi && \
if [[ ( "${EXTRA_BACKENDS}" =~ "exllama2" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
make -C backend/python/exllama2 \
; fi && \
diff --git a/Makefile b/Makefile
index 04e280d8..9c4f3778 100644
--- a/Makefile
+++ b/Makefile
@@ -533,10 +533,10 @@ protogen-go-clean:
$(RM) bin/*
.PHONY: protogen-python
-protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen openvoice-protogen faster-whisper-protogen
+protogen-python: autogptq-protogen bark-protogen coqui-protogen diffusers-protogen exllama2-protogen rerankers-protogen transformers-protogen kokoro-protogen vllm-protogen faster-whisper-protogen
.PHONY: protogen-python-clean
-protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean openvoice-protogen-clean faster-whisper-protogen-clean
+protogen-python-clean: autogptq-protogen-clean bark-protogen-clean coqui-protogen-clean diffusers-protogen-clean exllama2-protogen-clean rerankers-protogen-clean transformers-protogen-clean kokoro-protogen-clean vllm-protogen-clean faster-whisper-protogen-clean
.PHONY: autogptq-protogen
autogptq-protogen:
@@ -610,14 +610,6 @@ kokoro-protogen:
kokoro-protogen-clean:
$(MAKE) -C backend/python/kokoro protogen-clean
-.PHONY: openvoice-protogen
-openvoice-protogen:
- $(MAKE) -C backend/python/openvoice protogen
-
-.PHONY: openvoice-protogen-clean
-openvoice-protogen-clean:
- $(MAKE) -C backend/python/openvoice protogen-clean
-
.PHONY: vllm-protogen
vllm-protogen:
$(MAKE) -C backend/python/vllm protogen
@@ -638,7 +630,6 @@ prepare-extra-conda-environments: protogen-python
$(MAKE) -C backend/python/rerankers
$(MAKE) -C backend/python/transformers
$(MAKE) -C backend/python/kokoro
- $(MAKE) -C backend/python/openvoice
$(MAKE) -C backend/python/exllama2
prepare-test-extra: protogen-python
diff --git a/backend/python/openvoice/Makefile b/backend/python/openvoice/Makefile
deleted file mode 100644
index a187a00f..00000000
--- a/backend/python/openvoice/Makefile
+++ /dev/null
@@ -1,25 +0,0 @@
-.DEFAULT_GOAL := install
-
-.PHONY: install
-install: protogen
- bash install.sh
-
-.PHONY: protogen
-protogen: backend_pb2_grpc.py backend_pb2.py
-
-.PHONY: protogen-clean
-protogen-clean:
- $(RM) backend_pb2_grpc.py backend_pb2.py
-
-backend_pb2_grpc.py backend_pb2.py:
- python3 -m grpc_tools.protoc -I../.. --python_out=. --grpc_python_out=. backend.proto
-
-.PHONY: clean
-clean: protogen-clean
- rm -rf venv __pycache__
-
-.PHONY: test
-test: protogen
- @echo "Testing openvoice..."
- bash test.sh
- @echo "openvoice tested."
\ No newline at end of file
diff --git a/backend/python/openvoice/backend.py b/backend/python/openvoice/backend.py
deleted file mode 100755
index 7dde08cf..00000000
--- a/backend/python/openvoice/backend.py
+++ /dev/null
@@ -1,158 +0,0 @@
-#!/usr/bin/env python3
-"""
-Extra gRPC server for OpenVoice models.
-"""
-from concurrent import futures
-
-import argparse
-import signal
-import sys
-import os
-import torch
-from openvoice import se_extractor
-from openvoice.api import ToneColorConverter
-from melo.api import TTS
-
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-
-_ONE_DAY_IN_SECONDS = 60 * 60 * 24
-
-# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
-MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
-
-# Implement the BackendServicer class with the service methods
-class BackendServicer(backend_pb2_grpc.BackendServicer):
- """
- A gRPC servicer for the backend service.
-
- This class implements the gRPC methods for the backend service, including Health, LoadModel, and Embedding.
- """
- def Health(self, request, context):
- """
- A gRPC method that returns the health status of the backend service.
-
- Args:
- request: A HealthRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- A Reply object that contains the health status of the backend service.
- """
- return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
-
- def LoadModel(self, request, context):
- """
- A gRPC method that loads a model into memory.
-
- Args:
- request: A LoadModelRequest object that contains the request parameters.
- context: A grpc.ServicerContext object that provides information about the RPC.
-
- Returns:
- A Result object that contains the result of the LoadModel operation.
- """
- model_name = request.Model
- try:
-
- self.clonedVoice = False
- # Assume directory from request.ModelFile.
- # Only if request.LoraAdapter it's not an absolute path
- if request.AudioPath and request.ModelFile != "" and not os.path.isabs(request.AudioPath):
- # get base path of modelFile
- modelFileBase = os.path.dirname(request.ModelFile)
- request.AudioPath = os.path.join(modelFileBase, request.AudioPath)
- if request.AudioPath != "":
- self.clonedVoice = True
-
- self.modelpath = request.ModelFile
- self.speaker = request.Type
- self.ClonedVoicePath = request.AudioPath
-
- ckpt_converter = request.Model+'/converter'
- device = "cuda:0" if torch.cuda.is_available() else "cpu"
- self.device = device
- self.tone_color_converter = None
- if self.clonedVoice:
- self.tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
- self.tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')
-
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
-
- return backend_pb2.Result(message="Model loaded successfully", success=True)
-
- def TTS(self, request, context):
- model_name = request.model
- if model_name == "":
- return backend_pb2.Result(success=False, message="request.model is required")
- try:
- # Speed is adjustable
- speed = 1.0
- voice = "EN"
- if request.voice:
- voice = request.voice
- model = TTS(language=voice, device=self.device)
- speaker_ids = model.hps.data.spk2id
- speaker_key = self.speaker
- modelpath = self.modelpath
- for s in speaker_ids.keys():
- print(f"Speaker: {s} - ID: {speaker_ids[s]}")
- speaker_id = speaker_ids[speaker_key]
- speaker_key = speaker_key.lower().replace('_', '-')
- source_se = torch.load(f'{modelpath}/base_speakers/ses/{speaker_key}.pth', map_location=self.device)
- model.tts_to_file(request.text, speaker_id, request.dst, speed=speed)
- if self.clonedVoice:
- reference_speaker = self.ClonedVoicePath
- target_se, audio_name = se_extractor.get_se(reference_speaker, self.tone_color_converter, vad=False)
- # Run the tone color converter
- encode_message = "@MyShell"
- self.tone_color_converter.convert(
- audio_src_path=request.dst,
- src_se=source_se,
- tgt_se=target_se,
- output_path=request.dst,
- message=encode_message)
-
- print("[OpenVoice] TTS generated!", file=sys.stderr)
- print("[OpenVoice] TTS saved to", request.dst, file=sys.stderr)
- print(request, file=sys.stderr)
- except Exception as err:
- return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
- return backend_pb2.Result(success=True)
-
-def serve(address):
- server = grpc.server(futures.ThreadPoolExecutor(max_workers=MAX_WORKERS))
- backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
- server.add_insecure_port(address)
- server.start()
- print("[OpenVoice] Server started. Listening on: " + address, file=sys.stderr)
-
- # Define the signal handler function
- def signal_handler(sig, frame):
- print("[OpenVoice] Received termination signal. Shutting down...")
- server.stop(0)
- sys.exit(0)
-
- # Set the signal handlers for SIGINT and SIGTERM
- signal.signal(signal.SIGINT, signal_handler)
- signal.signal(signal.SIGTERM, signal_handler)
-
- try:
- while True:
- time.sleep(_ONE_DAY_IN_SECONDS)
- except KeyboardInterrupt:
- server.stop(0)
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser(description="Run the gRPC server.")
- parser.add_argument(
- "--addr", default="localhost:50051", help="The address to bind the server to."
- )
- args = parser.parse_args()
- print(f"[OpenVoice] startup: {args}", file=sys.stderr)
- serve(args.addr)
diff --git a/backend/python/openvoice/install.sh b/backend/python/openvoice/install.sh
deleted file mode 100755
index 24db146b..00000000
--- a/backend/python/openvoice/install.sh
+++ /dev/null
@@ -1,16 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
-# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
-# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
-# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
-if [ "x${BUILD_PROFILE}" == "xintel" ]; then
- EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
-fi
-
-installRequirements
-
-python -m unidic download
diff --git a/backend/python/openvoice/requirements-cpu.txt b/backend/python/openvoice/requirements-cpu.txt
deleted file mode 100644
index dd2eb221..00000000
--- a/backend/python/openvoice/requirements-cpu.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-torch==2.4.1
-git+https://github.com/myshell-ai/MeloTTS.git
-git+https://github.com/myshell-ai/OpenVoice.git
-whisper-timestamped
-pydub==0.25.1
-wavmark==0.0.3
-eng_to_ipa==0.0.2
\ No newline at end of file
diff --git a/backend/python/openvoice/requirements-cublas11.txt b/backend/python/openvoice/requirements-cublas11.txt
deleted file mode 100644
index 84ecc344..00000000
--- a/backend/python/openvoice/requirements-cublas11.txt
+++ /dev/null
@@ -1,8 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/cu118
-torch==2.4.1+cu118
-git+https://github.com/myshell-ai/MeloTTS.git
-git+https://github.com/myshell-ai/OpenVoice.git
-whisper-timestamped
-pydub==0.25.1
-wavmark==0.0.3
-eng_to_ipa==0.0.2
\ No newline at end of file
diff --git a/backend/python/openvoice/requirements-cublas12.txt b/backend/python/openvoice/requirements-cublas12.txt
deleted file mode 100644
index dd2eb221..00000000
--- a/backend/python/openvoice/requirements-cublas12.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-torch==2.4.1
-git+https://github.com/myshell-ai/MeloTTS.git
-git+https://github.com/myshell-ai/OpenVoice.git
-whisper-timestamped
-pydub==0.25.1
-wavmark==0.0.3
-eng_to_ipa==0.0.2
\ No newline at end of file
diff --git a/backend/python/openvoice/requirements-hipblas.txt b/backend/python/openvoice/requirements-hipblas.txt
deleted file mode 100644
index 4c2d6649..00000000
--- a/backend/python/openvoice/requirements-hipblas.txt
+++ /dev/null
@@ -1,8 +0,0 @@
---extra-index-url https://download.pytorch.org/whl/rocm6.0
-torch==2.4.1+rocm6.0
-git+https://github.com/myshell-ai/MeloTTS.git
-git+https://github.com/myshell-ai/OpenVoice.git
-whisper-timestamped
-pydub==0.25.1
-wavmark==0.0.3
-eng_to_ipa==0.0.2
\ No newline at end of file
diff --git a/backend/python/openvoice/requirements-intel.txt b/backend/python/openvoice/requirements-intel.txt
deleted file mode 100644
index 39b2b8b0..00000000
--- a/backend/python/openvoice/requirements-intel.txt
+++ /dev/null
@@ -1,24 +0,0 @@
---extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-intel-extension-for-pytorch==2.3.110+xpu
-torch==2.3.1+cxx11.abi
-torchaudio==2.3.1+cxx11.abi
-oneccl_bind_pt==2.3.100+xpu
-optimum[openvino]
-grpcio==1.69.0
-protobuf
-librosa==0.9.1
-faster-whisper==0.9.0
-pydub==0.25.1
-wavmark==0.0.3
-eng_to_ipa==0.0.2
-inflect==7.0.0
-unidecode==1.3.7
-whisper-timestamped==1.14.2
-openai
-python-dotenv
-pypinyin==0.50.0
-cn2an==0.5.22
-jieba==0.42.1
-langid==1.1.6
-git+https://github.com/myshell-ai/MeloTTS.git
-git+https://github.com/myshell-ai/OpenVoice.git
diff --git a/backend/python/openvoice/requirements.txt b/backend/python/openvoice/requirements.txt
deleted file mode 100644
index 62b886bb..00000000
--- a/backend/python/openvoice/requirements.txt
+++ /dev/null
@@ -1,17 +0,0 @@
-grpcio==1.69.0
-protobuf
-librosa
-faster-whisper
-inflect
-unidecode
-openai
-python-dotenv
-pypinyin
-cn2an==0.5.22
-numpy==1.22.0
-networkx==2.8.8
-jieba==0.42.1
-gradio==5.9.1
-langid==1.1.6
-llvmlite==0.43.0
-setuptools
\ No newline at end of file
diff --git a/backend/python/openvoice/run.sh b/backend/python/openvoice/run.sh
deleted file mode 100755
index 375c07e5..00000000
--- a/backend/python/openvoice/run.sh
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/bin/bash
-source $(dirname $0)/../common/libbackend.sh
-
-startBackend $@
\ No newline at end of file
diff --git a/backend/python/openvoice/test.py b/backend/python/openvoice/test.py
deleted file mode 100644
index 82f08785..00000000
--- a/backend/python/openvoice/test.py
+++ /dev/null
@@ -1,82 +0,0 @@
-"""
-A test script to test the gRPC service
-"""
-import unittest
-import subprocess
-import time
-import backend_pb2
-import backend_pb2_grpc
-
-import grpc
-
-
-class TestBackendServicer(unittest.TestCase):
- """
- TestBackendServicer is the class that tests the gRPC service
- """
- def setUp(self):
- """
- This method sets up the gRPC service by starting the server
- """
- self.service = subprocess.Popen(["python3", "backend.py", "--addr", "localhost:50051"])
- time.sleep(30)
-
- def tearDown(self) -> None:
- """
- This method tears down the gRPC service by terminating the server
- """
- self.service.terminate()
- self.service.wait()
-
- def test_server_startup(self):
- """
- This method tests if the server starts up successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.Health(backend_pb2.HealthMessage())
- self.assertEqual(response.message, b'OK')
- except Exception as err:
- print(err)
- self.fail("Server failed to start")
- finally:
- self.tearDown()
-
- def test_load_model(self):
- """
- This method tests if the model is loaded successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="checkpoints_v2",
- Type="en-us"))
- self.assertTrue(response.success)
- self.assertEqual(response.message, "Model loaded successfully")
- except Exception as err:
- print(err)
- self.fail("LoadModel service failed")
- finally:
- self.tearDown()
-
- def test_tts(self):
- """
- This method tests if the embeddings are generated successfully
- """
- try:
- self.setUp()
- with grpc.insecure_channel("localhost:50051") as channel:
- stub = backend_pb2_grpc.BackendStub(channel)
- response = stub.LoadModel(backend_pb2.ModelOptions(Model="dingzhen"))
- self.assertTrue(response.success)
- tts_request = backend_pb2.TTSRequest(text="80s TV news production music hit for tonight's biggest story", voice="EN")
- tts_response = stub.TTS(tts_request)
- self.assertIsNotNone(tts_response)
- except Exception as err:
- print(err)
- self.fail("TTS service failed")
- finally:
- self.tearDown()
\ No newline at end of file
diff --git a/backend/python/openvoice/test.sh b/backend/python/openvoice/test.sh
deleted file mode 100755
index 6c0a840f..00000000
--- a/backend/python/openvoice/test.sh
+++ /dev/null
@@ -1,12 +0,0 @@
-#!/bin/bash
-set -e
-
-source $(dirname $0)/../common/libbackend.sh
-
-# Download checkpoints if not present
-if [ ! -d "checkpoints_v2" ]; then
- wget https://myshell-public-repo-host.s3.amazonaws.com/openvoice/checkpoints_v2_0417.zip -O checkpoints_v2.zip
- unzip checkpoints_v2.zip
-fi
-
-runUnittests
From eef80b9880f6d5bc875c0a2b57d289fde7248566 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 23 Jan 2025 10:02:57 +0100
Subject: [PATCH 081/679] chore(ci): cleanup tests
Signed-off-by: Ettore Di Giacinto
---
.github/workflows/test-extra.yml | 51 --------------------------------
1 file changed, 51 deletions(-)
diff --git a/.github/workflows/test-extra.yml b/.github/workflows/test-extra.yml
index e99ea516..7f2445c8 100644
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -78,57 +78,6 @@ jobs:
make --jobs=5 --output-sync=target -C backend/python/diffusers
make --jobs=5 --output-sync=target -C backend/python/diffusers test
- tests-parler-tts:
- runs-on: ubuntu-latest
- steps:
- - name: Clone
- uses: actions/checkout@v4
- with:
- submodules: true
- - name: Dependencies
- run: |
- sudo apt-get update
- sudo apt-get install build-essential ffmpeg
- # Install UV
- curl -LsSf https://astral.sh/uv/install.sh | sh
- sudo apt-get install -y ca-certificates cmake curl patch python3-pip
- sudo apt-get install -y libopencv-dev
- pip install --user --no-cache-dir grpcio-tools==1.64.1
-
- - name: Test parler-tts
- run: |
- make --jobs=5 --output-sync=target -C backend/python/parler-tts
- make --jobs=5 --output-sync=target -C backend/python/parler-tts test
- - name: Setup tmate session if tests fail
- if: ${{ failure() }}
- uses: mxschmitt/action-tmate@v3.19
- with:
- detached: true
- connect-timeout-seconds: 180
- limit-access-to-actor: true
-
- tests-openvoice:
- runs-on: ubuntu-latest
- steps:
- - name: Clone
- uses: actions/checkout@v4
- with:
- submodules: true
- - name: Dependencies
- run: |
- sudo apt-get update
- sudo apt-get install build-essential ffmpeg
- # Install UV
- curl -LsSf https://astral.sh/uv/install.sh | sh
- sudo apt-get install -y ca-certificates cmake curl patch python3-pip
- sudo apt-get install -y libopencv-dev
- pip install --user --no-cache-dir grpcio-tools==1.64.1
-
- - name: Test openvoice
- run: |
- make --jobs=5 --output-sync=target -C backend/python/openvoice
- make --jobs=5 --output-sync=target -C backend/python/openvoice test
-
# tests-transformers-musicgen:
# runs-on: ubuntu-latest
# steps:
From f9e368b7c4a9604dbfebeb602d08a17d322d5805 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 23 Jan 2025 16:35:44 +0100
Subject: [PATCH 082/679] chore(refactor): group cpu cap detection (#4674)
Signed-off-by: Ettore Di Giacinto
---
pkg/model/initializers.go | 49 ++++++++++++++-------------------------
1 file changed, 17 insertions(+), 32 deletions(-)
diff --git a/pkg/model/initializers.go b/pkg/model/initializers.go
index d5f1459b..9fc0c18c 100644
--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@@ -66,6 +66,17 @@ const (
LocalStoreBackend = "local-store"
)
+var llamaCPPVariants = []string{
+ LLamaCPPAVX2,
+ LLamaCPPAVX,
+ LLamaCPPFallback,
+ LLamaCPPCUDA,
+ LLamaCPPHipblas,
+ LLamaCPPSycl16,
+ LLamaCPPSycl32,
+ LLamaCPPGRPC,
+}
+
func backendPath(assetDir, backend string) string {
return filepath.Join(assetDir, "backend-assets", "grpc", backend)
}
@@ -107,40 +118,14 @@ ENTRY:
if AutoDetect {
// if we find the llama.cpp variants, show them of as a single backend (llama-cpp) as later we are going to pick that up
// when starting the service
- foundLCPPAVX, foundLCPPAVX2, foundLCPPFallback, foundLCPPGRPC, foundLCPPCuda, foundLCPPHipblas, foundSycl16, foundSycl32 := false, false, false, false, false, false, false, false
+ foundVariants := map[string]bool{}
if _, ok := backends[LLamaCPP]; !ok {
for _, e := range entry {
- if strings.Contains(e.Name(), LLamaCPPAVX2) && !foundLCPPAVX2 {
- backends[LLamaCPP] = append(backends[LLamaCPP], LLamaCPPAVX2)
- foundLCPPAVX2 = true
- }
- if strings.Contains(e.Name(), LLamaCPPAVX) && !foundLCPPAVX {
- backends[LLamaCPP] = append(backends[LLamaCPP], LLamaCPPAVX)
- foundLCPPAVX = true
- }
- if strings.Contains(e.Name(), LLamaCPPFallback) && !foundLCPPFallback {
- backends[LLamaCPP] = append(backends[LLamaCPP], LLamaCPPFallback)
- foundLCPPFallback = true
- }
- if strings.Contains(e.Name(), LLamaCPPGRPC) && !foundLCPPGRPC {
- backends[LLamaCPP] = append(backends[LLamaCPP], LLamaCPPGRPC)
- foundLCPPGRPC = true
- }
- if strings.Contains(e.Name(), LLamaCPPCUDA) && !foundLCPPCuda {
- backends[LLamaCPP] = append(backends[LLamaCPP], LLamaCPPCUDA)
- foundLCPPCuda = true
- }
- if strings.Contains(e.Name(), LLamaCPPHipblas) && !foundLCPPHipblas {
- backends[LLamaCPP] = append(backends[LLamaCPP], LLamaCPPHipblas)
- foundLCPPHipblas = true
- }
- if strings.Contains(e.Name(), LLamaCPPSycl16) && !foundSycl16 {
- backends[LLamaCPP] = append(backends[LLamaCPP], LLamaCPPSycl16)
- foundSycl16 = true
- }
- if strings.Contains(e.Name(), LLamaCPPSycl32) && !foundSycl32 {
- backends[LLamaCPP] = append(backends[LLamaCPP], LLamaCPPSycl32)
- foundSycl32 = true
+ for _, v := range llamaCPPVariants {
+ if strings.Contains(e.Name(), v) && !foundVariants[v] {
+ backends[LLamaCPP] = append(backends[LLamaCPP], v)
+ foundVariants[v] = true
+ }
}
}
}
From 5177837ab045c3df2a6096baad1a01f63083b130 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 24 Jan 2025 08:26:44 +0100
Subject: [PATCH 083/679] chore: detect and enable avx512 builds (#4675)
chore(avx512): add support
Fixes https://github.com/mudler/LocalAI/issues/4662
Signed-off-by: Ettore Di Giacinto
---
Dockerfile | 2 +-
Makefile | 8 ++++++++
pkg/model/initializers.go | 8 ++++++++
3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/Dockerfile b/Dockerfile
index 566e03bc..2f2bcafa 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -303,7 +303,7 @@ RUN make prepare
## We only leave the most CPU-optimized variant and the fallback for the cublas/hipblas build
## (both will use CUDA or hipblas for the actual computation)
RUN if [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then \
- SKIP_GRPC_BACKEND="backend-assets/grpc/llama-cpp-avx backend-assets/grpc/llama-cpp-avx2" make build; \
+ SKIP_GRPC_BACKEND="backend-assets/grpc/llama-cpp-avx512 backend-assets/grpc/llama-cpp-avx backend-assets/grpc/llama-cpp-avx2" make build; \
else \
make build; \
fi
diff --git a/Makefile b/Makefile
index 9c4f3778..e3c28039 100644
--- a/Makefile
+++ b/Makefile
@@ -186,6 +186,7 @@ endif
ALL_GRPC_BACKENDS=backend-assets/grpc/huggingface
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx2
+ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx512
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-fallback
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-ggml
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-grpc
@@ -699,6 +700,13 @@ backend-assets/grpc/llama-cpp-avx2: backend-assets/grpc backend/cpp/llama/llama.
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) VARIANT="llama-avx2" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-avx2/grpc-server backend-assets/grpc/llama-cpp-avx2
+backend-assets/grpc/llama-cpp-avx512: backend-assets/grpc backend/cpp/llama/llama.cpp
+ cp -rf backend/cpp/llama backend/cpp/llama-avx512
+ $(MAKE) -C backend/cpp/llama-avx512 purge
+ $(info ${GREEN}I llama-cpp build info:avx512${RESET})
+ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) VARIANT="llama-avx512" build-llama-cpp-grpc-server
+ cp -rfv backend/cpp/llama-avx512/grpc-server backend-assets/grpc/llama-cpp-avx512
+
backend-assets/grpc/llama-cpp-avx: backend-assets/grpc backend/cpp/llama/llama.cpp
cp -rf backend/cpp/llama backend/cpp/llama-avx
$(MAKE) -C backend/cpp/llama-avx purge
diff --git a/pkg/model/initializers.go b/pkg/model/initializers.go
index 9fc0c18c..ace72fa3 100644
--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@@ -48,6 +48,7 @@ const (
LLamaCPP = "llama-cpp"
LLamaCPPAVX2 = "llama-cpp-avx2"
+ LLamaCPPAVX512 = "llama-cpp-avx512"
LLamaCPPAVX = "llama-cpp-avx"
LLamaCPPFallback = "llama-cpp-fallback"
LLamaCPPCUDA = "llama-cpp-cuda"
@@ -68,6 +69,7 @@ const (
var llamaCPPVariants = []string{
LLamaCPPAVX2,
+ LLamaCPPAVX512,
LLamaCPPAVX,
LLamaCPPFallback,
LLamaCPPCUDA,
@@ -268,6 +270,12 @@ func selectGRPCProcessByHostCapabilities(backend, assetDir string, f16 bool) str
log.Info().Msgf("[%s] attempting to load with AVX2 variant", backend)
selectedProcess = p
}
+ } else if xsysinfo.HasCPUCaps(cpuid.AVX512F) {
+ p := backendPath(assetDir, LLamaCPPAVX512)
+ if _, err := os.Stat(p); err == nil {
+ log.Info().Msgf("[%s] attempting to load with AVX512 variant", backend)
+ selectedProcess = p
+ }
} else if xsysinfo.HasCPUCaps(cpuid.AVX) {
p := backendPath(assetDir, LLamaCPPAVX)
if _, err := os.Stat(p); err == nil {
From d1d7ce83d4195113b45d6f0d7dba79d321a86df4 Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Fri, 24 Jan 2025 08:27:02 +0100
Subject: [PATCH 084/679] chore(model gallery): add MiniCPM-o-2.6-7.6b (#4676)
Signed-off-by: Gianluca Boiano
---
gallery/index.yaml | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 4ce19bb4..d37f0ab4 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5667,6 +5667,32 @@
- filename: marco-o1-uncensored.Q4_K_M.gguf
sha256: ad0440270a7254098f90779744d3e5b34fe49b7baf97c819909ba9c5648cc0d9
uri: huggingface://QuantFactory/marco-o1-uncensored-GGUF/marco-o1-uncensored.Q4_K_M.gguf
+- !!merge <<: *qwen2
+ name: "minicpm-o-2_6"
+ icon: https://avatars.githubusercontent.com/u/89920203
+ urls:
+ - https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf
+ - https://huggingface.co/openbmb/MiniCPM-o-2_6
+ description: |
+ MiniCPM-o 2.6 is the latest and most capable model in the MiniCPM-o series. The model is built in an end-to-end fashion based on SigLip-400M, Whisper-medium-300M, ChatTTS-200M, and Qwen2.5-7B with a total of 8B parameters
+ tags:
+ - llm
+ - multimodal
+ - gguf
+ - gpu
+ - qwen2
+ - cpu
+ overrides:
+ mmproj: minicpm-o-2_6-mmproj-f16.gguf
+ parameters:
+ model: minicpm-o-2_6-Q4_K_M.gguf
+ files:
+ - filename: minicpm-o-2_6-Q4_K_M.gguf
+ sha256: 4f635fc0c0bb88d50ccd9cf1f1e5892b5cb085ff88fe0d8e1148fd9a8a836bc2
+ uri: huggingface://openbmb/MiniCPM-o-2_6-gguf/Model-7.6B-Q4_K_M.gguf
+ - filename: minicpm-o-2_6-mmproj-f16.gguf
+ sha256: efa4f7d96aa0f838f2023fc8d28e519179b16f1106777fa9280b32628191aa3e
+ uri: huggingface://openbmb/MiniCPM-o-2_6-gguf/mmproj-model-f16.gguf
- !!merge <<: *qwen2
name: "minicpm-v-2_6"
license: apache-2.0
From 82824145839bc4dd3dfad64519ac1151a03a260a Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 24 Jan 2025 08:27:22 +0100
Subject: [PATCH 085/679] chore(downloader): support hf.co and hf:// URIs
(#4677)
Signed-off-by: Ettore Di Giacinto
---
pkg/downloader/uri.go | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/pkg/downloader/uri.go b/pkg/downloader/uri.go
index 2e0363c8..54b8eb10 100644
--- a/pkg/downloader/uri.go
+++ b/pkg/downloader/uri.go
@@ -21,14 +21,16 @@ import (
)
const (
- HuggingFacePrefix = "huggingface://"
- OCIPrefix = "oci://"
- OllamaPrefix = "ollama://"
- HTTPPrefix = "http://"
- HTTPSPrefix = "https://"
- GithubURI = "github:"
- GithubURI2 = "github://"
- LocalPrefix = "file://"
+ HuggingFacePrefix = "huggingface://"
+ HuggingFacePrefix1 = "hf://"
+ HuggingFacePrefix2 = "hf.co/"
+ OCIPrefix = "oci://"
+ OllamaPrefix = "ollama://"
+ HTTPPrefix = "http://"
+ HTTPSPrefix = "https://"
+ GithubURI = "github:"
+ GithubURI2 = "github://"
+ LocalPrefix = "file://"
)
type URI string
@@ -127,6 +129,8 @@ func (u URI) LooksLikeURL() bool {
return strings.HasPrefix(string(u), HTTPPrefix) ||
strings.HasPrefix(string(u), HTTPSPrefix) ||
strings.HasPrefix(string(u), HuggingFacePrefix) ||
+ strings.HasPrefix(string(u), HuggingFacePrefix1) ||
+ strings.HasPrefix(string(u), HuggingFacePrefix2) ||
strings.HasPrefix(string(u), GithubURI) ||
strings.HasPrefix(string(u), OllamaPrefix) ||
strings.HasPrefix(string(u), OCIPrefix) ||
@@ -170,8 +174,10 @@ func (s URI) ResolveURL() string {
projectPath := strings.Join(repoPath[2:], "/")
return fmt.Sprintf("https://raw.githubusercontent.com/%s/%s/%s/%s", org, project, branch, projectPath)
- case strings.HasPrefix(string(s), HuggingFacePrefix):
+ case strings.HasPrefix(string(s), HuggingFacePrefix) || strings.HasPrefix(string(s), HuggingFacePrefix1) || strings.HasPrefix(string(s), HuggingFacePrefix2):
repository := strings.Replace(string(s), HuggingFacePrefix, "", 1)
+ repository = strings.Replace(repository, HuggingFacePrefix1, "", 1)
+ repository = strings.Replace(repository, HuggingFacePrefix2, "", 1)
// convert repository to a full URL.
// e.g. TheBloke/Mixtral-8x7B-v0.1-GGUF/mixtral-8x7b-v0.1.Q2_K.gguf@main -> https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/resolve/main/mixtral-8x7b-v0.1.Q2_K.gguf
owner := strings.Split(repository, "/")[0]
From 66e9ef3f33b35b7c4879ddfe76f9223061b3a7f9 Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Fri, 24 Jan 2025 08:28:44 +0100
Subject: [PATCH 086/679] chore(model gallery): add DeepSeek R1 14b, 32b and
70b (#4679)
Signed-off-by: Gianluca Boiano
---
gallery/index.yaml | 113 ++++++++++++++++++++++++++++++++-------------
1 file changed, 80 insertions(+), 33 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index d37f0ab4..619f43b6 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -2696,39 +2696,6 @@
- filename: Qwentile2.5-32B-Instruct-Q4_K_M.gguf
sha256: e476d6e3c15c78fc3f986d7ae8fa35c16116843827f2e6243c05767cef2f3615
uri: huggingface://bartowski/Qwentile2.5-32B-Instruct-GGUF/Qwentile2.5-32B-Instruct-Q4_K_M.gguf
-- !!merge <<: *qwen25
- name: "deepseek-r1-distill-qwen-1.5b"
- icon: "https://avatars.githubusercontent.com/u/148330874"
- urls:
- - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5b
- - https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF
- description: |
- DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks.
- Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing.
- By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.
- overrides:
- parameters:
- model: deepseek-r1-distill-qwen-1.5b-Q4_K_M.gguf
- files:
- - filename: deepseek-r1-distill-qwen-1.5b-Q4_K_M.gguf
- sha256: c2c43b6018cf7700ce0ddee8807deb1a9a26758ef878232f3a142d16df81f0fe
- uri: huggingface://unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
-- !!merge <<: *qwen25
- name: "deepseek-r1-distill-qwen-7b"
- urls:
- - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- - https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF
- description: |
- DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks.
- Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing.
- By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.
- overrides:
- parameters:
- model: DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
- files:
- - filename: DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
- sha256: 731ece8d06dc7eda6f6572997feb9ee1258db0784827e642909d9b565641937b
- uri: huggingface://bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
- &archfunct
license: apache-2.0
tags:
@@ -5334,6 +5301,86 @@
- filename: archangel_sft_pythia2-8b.Q4_K_M.gguf
sha256: a47782c55ef2b39b19644213720a599d9849511a73c9ebb0c1de749383c0a0f8
uri: huggingface://RichardErkhov/ContextualAI_-_archangel_sft_pythia2-8b-gguf/archangel_sft_pythia2-8b.Q4_K_M.gguf
+- &deepseek-r1 ## Start DeepSeek-R1
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+ name: "deepseek-r1-distill-qwen-1.5b"
+ icon: "https://avatars.githubusercontent.com/u/148330874"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5b
+ - https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-1.5B-GGUF
+ description: |
+ DeepSeek-R1 is our advanced first-generation reasoning model designed to enhance performance in reasoning tasks.
+ Building on the foundation laid by its predecessor, DeepSeek-R1-Zero, which was trained using large-scale reinforcement learning (RL) without supervised fine-tuning, DeepSeek-R1 addresses the challenges faced by R1-Zero, such as endless repetition, poor readability, and language mixing.
+ By incorporating cold-start data prior to the RL phase,DeepSeek-R1 significantly improves reasoning capabilities and achieves performance levels comparable to OpenAI-o1 across a variety of domains, including mathematics, coding, and complex reasoning tasks.
+ overrides:
+ parameters:
+ model: DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
+ files:
+ - filename: DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
+ sha256: 1741e5b2d062b07acf048bf0d2c514dadf2a48f94e2b4aa0cfe069af3838ee2f
+ uri: huggingface://bartowski/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "deepseek-r1-distill-qwen-7b"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+ - https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF
+ overrides:
+ parameters:
+ model: DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
+ files:
+ - filename: DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
+ sha256: 731ece8d06dc7eda6f6572997feb9ee1258db0784827e642909d9b565641937b
+ uri: huggingface://bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "deepseek-r1-distill-qwen-14b"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+ - https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF
+ overrides:
+ parameters:
+ model: DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf
+ files:
+ - filename: DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf
+ sha256: 0b319bd0572f2730bfe11cc751defe82045fad5085b4e60591ac2cd2d9633181
+ uri: huggingface://bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "deepseek-r1-distill-qwen-32b"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
+ - https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF
+ overrides:
+ parameters:
+ model: DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf
+ files:
+ - filename: DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf
+ sha256: bed9b0f551f5b95bf9da5888a48f0f87c37ad6b72519c4cbd775f54ac0b9fc62
+ uri: huggingface://bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "deepseek-r1-distill-llama-8b"
+ icon: "https://avatars.githubusercontent.com/u/148330874"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+ - https://huggingface.co/bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF
+ overrides:
+ parameters:
+ model: DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf
+ files:
+ - filename: DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf
+ sha256: 87bcba20b4846d8dadf753d3ff48f9285d131fc95e3e0e7e934d4f20bc896f5d
+ uri: huggingface://bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "deepseek-r1-distill-llama-70b"
+ icon: "https://avatars.githubusercontent.com/u/148330874"
+ urls:
+ - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+ - https://huggingface.co/bartowski/DeepSeek-R 1-Distill-Llama-70B-GGUF
+ overrides:
+ parameters:
+ model: DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf
+ files:
+ - filename: DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf
+ sha256: 181a82a1d6d2fa24fe4db83a68eee030384986bdbdd4773ba76424e3a6eb9fd8
+ uri: huggingface://bartowski/DeepSeek-R1-Distill-Llama-70B-GGUF/DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf
- &qwen2 ## Start QWEN2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "qwen2-7b-instruct"
From 9a1182fa01f8efcbf4193cf1edaabb908f864dd1 Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Fri, 24 Jan 2025 08:29:02 +0100
Subject: [PATCH 087/679] chore(model gallery): add flux.1, stablediffusion and
whisper icons (#4680)
Signed-off-by: Gianluca Boiano
---
gallery/index.yaml | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 619f43b6..15dbf1e2 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -11137,7 +11137,7 @@
uri: huggingface://Lykon/DreamShaper/DreamShaper_8_pruned.safetensors
sha256: 879db523c30d3b9017143d56705015e15a2cb5628762c11d086fed9538abd7fd
- name: stable-diffusion-3-medium
- icon: https://huggingface.co/leo009/stable-diffusion-3-medium/resolve/main/sd3demo.jpg
+ icon: https://avatars.githubusercontent.com/u/100950301
license: other
description: |
Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
@@ -11152,6 +11152,7 @@
- gpu
url: "github:mudler/LocalAI/gallery/stablediffusion3.yaml@master"
- name: sd-1.5-ggml
+ icon: https://avatars.githubusercontent.com/u/37351293
license: creativeml-openrail-m
url: "github:mudler/LocalAI/gallery/sd-ggml.yaml@master"
description: |
@@ -11185,7 +11186,7 @@
- stablediffusion
- gpu
- cpu
- icon: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/media/main/sd3.5_medium_demo.jpg
+ icon: https://avatars.githubusercontent.com/u/100950301
overrides:
options:
- "clip_l_path:clip_l-Q4_0.gguf"
@@ -11220,7 +11221,7 @@
- stablediffusion
- gpu
- cpu
- icon: https://huggingface.co/stabilityai/stable-diffusion-3.5-large/media/main/sd3.5_large_demo.png
+ icon: https://avatars.githubusercontent.com/u/100950301
overrides:
parameters:
model: sd3.5_large-Q4_0.gguf
@@ -11239,6 +11240,7 @@
uri: huggingface://second-state/stable-diffusion-3.5-large-GGUF/t5xxl-Q5_0.gguf
- &flux
name: flux.1-dev
+ icon: https://avatars.githubusercontent.com/u/164064024
license: flux-1-dev-non-commercial-license
description: |
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post.
@@ -11262,7 +11264,6 @@
- !!merge <<: *flux
name: flux.1-schnell
license: apache-2
- icon: https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/schnell_grid.jpeg
description: |
FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post.
Key Features
@@ -11295,7 +11296,6 @@
- flux
- gpu
- cpu
- icon: https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/schnell_grid.jpeg
overrides:
parameters:
model: flux1-dev-Q2_K.gguf
@@ -11315,6 +11315,7 @@
- &whisper ## Whisper
url: "github:mudler/LocalAI/gallery/whisper-base.yaml@master"
name: "whisper-1"
+ icon: https://avatars.githubusercontent.com/u/14957082
license: "MIT"
urls:
- https://github.com/ggerganov/whisper.cpp
@@ -11492,6 +11493,7 @@
description: |
Stable Diffusion in NCNN with c++, supported txt2img and img2img
name: stablediffusion-cpp
+ icon: https://avatars.githubusercontent.com/u/100950301
- &piper ## Piper TTS
url: github:mudler/LocalAI/gallery/piper.yaml@master
name: voice-en-us-kathleen-low
@@ -12072,6 +12074,7 @@
uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-zh_CN-huayan-medium.tar.gz
sha256: 0299a5e7f481ba853404e9f0e1515a94d5409585d76963fa4d30c64bd630aa99
- name: "silero-vad"
+ icon: https://github.com/snakers4/silero-models/raw/master/files/silero_logo.jpg
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/snakers4/silero-vad
@@ -12091,6 +12094,7 @@
uri: https://huggingface.co/onnx-community/silero-vad/resolve/main/onnx/model.onnx
sha256: a4a068cd6cf1ea8355b84327595838ca748ec29a25bc91fc82e6c299ccdc5808
- name: "bark-cpp-small"
+ icon: https://avatars.githubusercontent.com/u/99442120
url: github:mudler/LocalAI/gallery/virtual.yaml@master
license: mit
urls:
From 4d44ebc2f2f261deaf20699d68c22a1ba18e7054 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 24 Jan 2025 10:18:22 +0100
Subject: [PATCH 088/679] chore(deps): bump grpcio to 1.70.0 (#4682)
Signed-off-by: Ettore Di Giacinto
---
backend/python/autogptq/requirements.txt | 2 +-
backend/python/bark/requirements.txt | 2 +-
backend/python/common/template/requirements.txt | 2 +-
backend/python/coqui/requirements.txt | 2 +-
backend/python/diffusers/requirements.txt | 2 +-
backend/python/exllama2/requirements.txt | 2 +-
backend/python/faster-whisper/requirements.txt | 2 +-
backend/python/kokoro/requirements.txt | 2 +-
backend/python/rerankers/requirements.txt | 2 +-
backend/python/transformers/requirements.txt | 2 +-
backend/python/vllm/requirements.txt | 2 +-
11 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/backend/python/autogptq/requirements.txt b/backend/python/autogptq/requirements.txt
index c857a867..af596d9e 100644
--- a/backend/python/autogptq/requirements.txt
+++ b/backend/python/autogptq/requirements.txt
@@ -1,6 +1,6 @@
accelerate
auto-gptq==0.7.1
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
certifi
transformers
\ No newline at end of file
diff --git a/backend/python/bark/requirements.txt b/backend/python/bark/requirements.txt
index 81c1273d..f4beaec1 100644
--- a/backend/python/bark/requirements.txt
+++ b/backend/python/bark/requirements.txt
@@ -1,4 +1,4 @@
bark==0.1.5
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
certifi
\ No newline at end of file
diff --git a/backend/python/common/template/requirements.txt b/backend/python/common/template/requirements.txt
index 0f43df10..125b18dd 100644
--- a/backend/python/common/template/requirements.txt
+++ b/backend/python/common/template/requirements.txt
@@ -1,3 +1,3 @@
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
grpcio-tools
\ No newline at end of file
diff --git a/backend/python/coqui/requirements.txt b/backend/python/coqui/requirements.txt
index 76c9ba4b..5ec13b5f 100644
--- a/backend/python/coqui/requirements.txt
+++ b/backend/python/coqui/requirements.txt
@@ -1,4 +1,4 @@
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
certifi
packaging==24.1
\ No newline at end of file
diff --git a/backend/python/diffusers/requirements.txt b/backend/python/diffusers/requirements.txt
index d49155ed..8c450dca 100644
--- a/backend/python/diffusers/requirements.txt
+++ b/backend/python/diffusers/requirements.txt
@@ -1,5 +1,5 @@
setuptools
-grpcio==1.69.0
+grpcio==1.70.0
pillow
protobuf
certifi
diff --git a/backend/python/exllama2/requirements.txt b/backend/python/exllama2/requirements.txt
index 77464406..cb622d0c 100644
--- a/backend/python/exllama2/requirements.txt
+++ b/backend/python/exllama2/requirements.txt
@@ -1,4 +1,4 @@
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
certifi
wheel
diff --git a/backend/python/faster-whisper/requirements.txt b/backend/python/faster-whisper/requirements.txt
index 0f43df10..125b18dd 100644
--- a/backend/python/faster-whisper/requirements.txt
+++ b/backend/python/faster-whisper/requirements.txt
@@ -1,3 +1,3 @@
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
grpcio-tools
\ No newline at end of file
diff --git a/backend/python/kokoro/requirements.txt b/backend/python/kokoro/requirements.txt
index 75d65ba1..06e60389 100644
--- a/backend/python/kokoro/requirements.txt
+++ b/backend/python/kokoro/requirements.txt
@@ -1,4 +1,4 @@
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
phonemizer
scipy
diff --git a/backend/python/rerankers/requirements.txt b/backend/python/rerankers/requirements.txt
index afc8b2a9..566fdae0 100644
--- a/backend/python/rerankers/requirements.txt
+++ b/backend/python/rerankers/requirements.txt
@@ -1,3 +1,3 @@
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
certifi
\ No newline at end of file
diff --git a/backend/python/transformers/requirements.txt b/backend/python/transformers/requirements.txt
index db41b928..c0fa0c0b 100644
--- a/backend/python/transformers/requirements.txt
+++ b/backend/python/transformers/requirements.txt
@@ -1,4 +1,4 @@
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
certifi
setuptools
diff --git a/backend/python/vllm/requirements.txt b/backend/python/vllm/requirements.txt
index a1eea776..1f92add8 100644
--- a/backend/python/vllm/requirements.txt
+++ b/backend/python/vllm/requirements.txt
@@ -1,4 +1,4 @@
-grpcio==1.69.0
+grpcio==1.70.0
protobuf
certifi
setuptools
\ No newline at end of file
From 9409c99738f32921255878af7c7b98db6e427b11 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Fri, 24 Jan 2025 22:45:54 +0100
Subject: [PATCH 089/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`c5d9effb49649db80a52caf5c0626de6f342f526` (#4685)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index e3c28039..0e4dd391 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=6152129d05870cb38162c422c6ba80434e021e9f
+CPPLLAMA_VERSION?=c5d9effb49649db80a52caf5c0626de6f342f526
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From e9cace137b52b37b32c6284ce842b432bd4e21c3 Mon Sep 17 00:00:00 2001
From: Gianluca Boiano <491117+M0Rf30@users.noreply.github.com>
Date: Sat, 25 Jan 2025 09:04:38 +0100
Subject: [PATCH 090/679] chore(model gallery): update deepseek-r1 prompt
template (#4686)
Signed-off-by: Gianluca Boiano
---
gallery/deepseek-r1.yaml | 23 +++++++++++++++++++++++
gallery/index.yaml | 2 +-
2 files changed, 24 insertions(+), 1 deletion(-)
create mode 100644 gallery/deepseek-r1.yaml
diff --git a/gallery/deepseek-r1.yaml b/gallery/deepseek-r1.yaml
new file mode 100644
index 00000000..29ca9db1
--- /dev/null
+++ b/gallery/deepseek-r1.yaml
@@ -0,0 +1,23 @@
+---
+name: "deepseek-r1"
+
+config_file: |
+ context_size: 131072
+ mmap: true
+ f16: true
+ stopwords:
+ - <ļ½begināofāsentenceļ½>
+ - <ļ½endāofāsentenceļ½>
+ - <ļ½Userļ½>
+ - <ļ½Assistantļ½>
+ template:
+ chat_message: |
+ {{if eq .RoleName "system" -}}{{.Content }}
+ {{ end -}}
+ {{if eq .RoleName "user" -}}<ļ½Userļ½>{{.Content}}
+ {{end -}}
+ {{if eq .RoleName "assistant" -}}<ļ½Assistantļ½>{{.Content}}<ļ½endāofāsentenceļ½>{{end}}
+ completion: |
+ {{.Input}}
+ chat: |
+ {{.Input -}}<ļ½Assistantļ½>
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 15dbf1e2..11e48fa5 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5302,7 +5302,7 @@
sha256: a47782c55ef2b39b19644213720a599d9849511a73c9ebb0c1de749383c0a0f8
uri: huggingface://RichardErkhov/ContextualAI_-_archangel_sft_pythia2-8b-gguf/archangel_sft_pythia2-8b.Q4_K_M.gguf
- &deepseek-r1 ## Start DeepSeek-R1
- url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+ url: "github:mudler/LocalAI/gallery/deepseek-r1.yaml@master"
name: "deepseek-r1-distill-qwen-1.5b"
icon: "https://avatars.githubusercontent.com/u/148330874"
urls:
From 8eef5a2c5ef85a045a10b7255520a7ca4fd9df81 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 25 Jan 2025 11:04:12 +0100
Subject: [PATCH 091/679] chore(model gallery): add lamarck-14b-v0.7 (#4687)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 11e48fa5..80a60dee 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3274,6 +3274,21 @@
- filename: DRT-o1-14B-Q4_K_M.gguf
sha256: 9619ca984cf4ce8e4f69bcde831de17b2ce05dd89536e3130608877521e3d328
uri: huggingface://bartowski/DRT-o1-14B-GGUF/DRT-o1-14B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "lamarck-14b-v0.7"
+ icon: https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7/resolve/main/LamarckShades.webp
+ urls:
+ - https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7
+ - https://huggingface.co/bartowski/Lamarck-14B-v0.7-GGUF
+ description: |
+ Lamarck 14B v0.7: A generalist merge with emphasis on multi-step reasoning, prose, and multi-language ability. The 14B parameter model class has a lot of strong performers, and Lamarck strives to be well-rounded and solid.
+ overrides:
+ parameters:
+ model: Lamarck-14B-v0.7-Q4_K_M.gguf
+ files:
+ - filename: Lamarck-14B-v0.7-Q4_K_M.gguf
+ sha256: ff8eba82b77a4c6b6d556b85629414655d881f8af4601bcf891c6a7b0345b442
+ uri: huggingface://bartowski/Lamarck-14B-v0.7-GGUF/Lamarck-14B-v0.7-Q4_K_M.gguf
- &smollm ## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "smollm-1.7b-instruct"
From 901b06284adaddddbf2cbbc58fd490080950b6a0 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 25 Jan 2025 11:06:05 +0100
Subject: [PATCH 092/679] chore(model gallery): add art-v0-3b (#4688)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 80a60dee..cc96f770 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3289,6 +3289,22 @@
- filename: Lamarck-14B-v0.7-Q4_K_M.gguf
sha256: ff8eba82b77a4c6b6d556b85629414655d881f8af4601bcf891c6a7b0345b442
uri: huggingface://bartowski/Lamarck-14B-v0.7-GGUF/Lamarck-14B-v0.7-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "art-v0-3b"
+ icon: https://blog.agi-0.com/_next/image?url=%2Fabout_img2.jpeg&w=1920&q=75
+ urls:
+ - https://huggingface.co/AGI-0/Art-v0-3B
+ - https://huggingface.co/bartowski/Art-v0-3B-GGUF
+ - https://blog.agi-0.com/posts/art-series
+ description: |
+ Art v0 3B is our inaugural model in the Art series, fine-tuned from Qwen/Qwen2.5-3B-Instruct using a specialized dataset generated with Gemini 2.0 Flash Thinking. Read more about the Art series
+ overrides:
+ parameters:
+ model: Art-v0-3B-Q4_K_M.gguf
+ files:
+ - filename: Art-v0-3B-Q4_K_M.gguf
+ sha256: 551acd326ce9a743b6e06e094865eb2f06c23c81c812ce221d757bf27ceec9f7
+ uri: huggingface://bartowski/Art-v0-3B-GGUF/Art-v0-3B-Q4_K_M.gguf
- &smollm ## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "smollm-1.7b-instruct"
From 4c3710a5319269fa159c8521dd74a13fe3be11c7 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 25 Jan 2025 11:07:31 +0100
Subject: [PATCH 093/679] chore(model gallery): add chuluun-qwen2.5-72b-v0.08
(#4689)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index cc96f770..12f1bc2e 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3305,6 +3305,24 @@
- filename: Art-v0-3B-Q4_K_M.gguf
sha256: 551acd326ce9a743b6e06e094865eb2f06c23c81c812ce221d757bf27ceec9f7
uri: huggingface://bartowski/Art-v0-3B-GGUF/Art-v0-3B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "chuluun-qwen2.5-72b-v0.08"
+ icon: https://huggingface.co/DatToad/Chuluun-Qwen2.5-72B-v0.08/resolve/main/Chuluun8-2.png
+ urls:
+ - https://huggingface.co/DatToad/Chuluun-Qwen2.5-72B-v0.08
+ - https://huggingface.co/bartowski/Chuluun-Qwen2.5-72B-v0.08-GGUF
+ description: |
+ This is a merge of pre-trained language models created using mergekit.
+ I re-ran the original Chuluun formula including the newly released Ink from Allura-Org. I've found the addition gives the model a lot more variability, likely because of aggressive de-slop applied to its dataset. Sometimes this means a word choice will be strange and you'll want to manually edit when needed, but it means you'll see less ministrations sparkling with mischief.
+ Because of this the best way to approach the model is to run multiple regens and choose the one you like, edit mercilessly, and continue. Like the original Chuluun this variant is very steerable for complex storywriting and RP. It's probably also a little spicier than v0.01 with both Magnum and whatever the heck Fizz threw into the data for Ink.
+ I've also been hearing praise for a level of character intelligence not seen in other models, including Largestral finetunes and merges. I'm not about to say any model of mine is smarter because it was a dumb idea to use Tess as the base and it somehow worked.
+ overrides:
+ parameters:
+ model: Chuluun-Qwen2.5-72B-v0.08-Q4_K_M.gguf
+ files:
+ - filename: Chuluun-Qwen2.5-72B-v0.08-Q4_K_M.gguf
+ sha256: 0fec82625f74a9a340837de7af287b1d9042e5aeb70cda2621426db99958b0af
+ uri: huggingface://bartowski/Chuluun-Qwen2.5-72B-v0.08-GGUF/Chuluun-Qwen2.5-72B-v0.08-Q4_K_M.gguf
- &smollm ## SmolLM
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "smollm-1.7b-instruct"
From 4ab107bc1ae1323f80dcad8b13fbefd943a067cb Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 25 Jan 2025 22:44:14 +0100
Subject: [PATCH 094/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`26771a1491f3a4c3d5b99c4c267b81aca9a7dfa0` (#4690)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 0e4dd391..f6ee9a08 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=c5d9effb49649db80a52caf5c0626de6f342f526
+CPPLLAMA_VERSION?=26771a1491f3a4c3d5b99c4c267b81aca9a7dfa0
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From a6bc8aa7c7583a989b0e86ea113e7d66900ee760 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 26 Jan 2025 10:01:37 +0100
Subject: [PATCH 095/679] chore(model gallery): add l3.3-nevoria-r1-70b (#4691)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 12f1bc2e..51f36da9 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -456,6 +456,25 @@
- filename: L3.3-Prikol-70B-v0.2-Q4_K_M.gguf
sha256: fc0ff514efbc0b67981c2bf1423d5a2e1b8801e4266ba0c653ea148414fe5ffc
uri: huggingface://bartowski/L3.3-Prikol-70B-v0.2-GGUF/L3.3-Prikol-70B-v0.2-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "l3.3-nevoria-r1-70b"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/64545af5ec40bbbd01242ca6/_oWpsvCZ-graNKzJBBjGo.jpeg
+ urls:
+ - https://huggingface.co/Steelskull/L3.3-Nevoria-R1-70b
+ - https://huggingface.co/bartowski/L3.3-Nevoria-R1-70b-GGUF
+ description: |
+ This model builds upon the original Nevoria foundation, incorporating the Deepseek-R1 reasoning architecture to enhance dialogue interaction and scene comprehension. While maintaining Nevoria's core strengths in storytelling and scene description (derived from EVA, EURYALE, and Anubis), this iteration aims to improve prompt adherence and creative reasoning capabilities. The model also retains the balanced perspective introduced by Negative_LLAMA and Nemotron elements. Also, the model plays the card to almost a fault, It'll pick up on minor issues and attempt to run with them. Users had it call them out for misspelling a word while playing in character.
+
+ Note: While Nevoria-R1 represents a significant architectural change, rather than a direct successor to Nevoria, it operates as a distinct model with its own characteristics.
+
+ The lorablated model base choice was intentional, creating unique weight interactions similar to the original Astoria model and Astoria V2 model. This "weight twisting" effect, achieved by subtracting the lorablated base model during merging, creates an interesting balance in the model's behavior. While unconventional compared to sequential component application, this approach was chosen for its unique response characteristics.
+ overrides:
+ parameters:
+ model: L3.3-Nevoria-R1-70b-Q4_K_M.gguf
+ files:
+ - filename: L3.3-Nevoria-R1-70b-Q4_K_M.gguf
+ sha256: 9f32f202fb5b1465c942693bb11eea9e8a1c5686b00602715b495c068eaf1c58
+ uri: huggingface://bartowski/L3.3-Nevoria-R1-70b-GGUF/L3.3-Nevoria-R1-70b-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From 8f5aa2d9deeb4817e950c753c90bdf38738cf681 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 26 Jan 2025 10:03:46 +0100
Subject: [PATCH 096/679] chore(model gallery): add dumpling-qwen2.5-32b
(#4692)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 51f36da9..f4ce6f6d 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3400,6 +3400,33 @@
- filename: Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
sha256: eaeac314e30b461413bc1cc819cdc0cd6a79265711fd0b8268702960a082c7bd
uri: huggingface://QuantFactory/Vikhr-Qwen-2.5-1.5B-Instruct-GGUF/Vikhr-Qwen-2.5-1.5B-Instruct.Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "dumpling-qwen2.5-32b"
+ icon: https://huggingface.co/nbeerbower/Dumpling-Qwen2.5-32B/resolve/main/dumpling_cover.png?download=true
+ urls:
+ - https://huggingface.co/nbeerbower/Dumpling-Qwen2.5-32B
+ - https://huggingface.co/bartowski/Dumpling-Qwen2.5-32B-GGUF
+ description: |
+ nbeerbower/Rombos-EVAGutenberg-TIES-Qwen2.5-32B finetuned on:
+ nbeerbower/GreatFirewall-DPO
+ nbeerbower/Schule-DPO
+ nbeerbower/Purpura-DPO
+ nbeerbower/Arkhaios-DPO
+ jondurbin/truthy-dpo-v0.1
+ antiven0m/physical-reasoning-dpo
+ flammenai/Date-DPO-NoAsterisks
+ flammenai/Prude-Phi3-DPO
+ Atsunori/HelpSteer2-DPO
+ jondurbin/gutenberg-dpo-v0.1
+ nbeerbower/gutenberg2-dpo
+ nbeerbower/gutenberg-moderne-dpo.
+ overrides:
+ parameters:
+ model: Dumpling-Qwen2.5-32B-Q4_K_M.gguf
+ files:
+ - filename: Dumpling-Qwen2.5-32B-Q4_K_M.gguf
+ sha256: c5b7d773cc614650ad3956008e30d0607df6106c28e381870a9b950bd4ee1d17
+ uri: huggingface://bartowski/Dumpling-Qwen2.5-32B-GGUF/Dumpling-Qwen2.5-32B-Q4_K_M.gguf
- &llama31 ## LLama3.1
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
From 3b6b37a81bb6224edd77276efbd661a3b2dc337e Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 26 Jan 2025 10:06:06 +0100
Subject: [PATCH 097/679] chore(model gallery): add
deepseek-r1-qwen-2.5-32b-ablated (#4693)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index f4ce6f6d..da601b35 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5476,6 +5476,25 @@
- filename: DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf
sha256: 181a82a1d6d2fa24fe4db83a68eee030384986bdbdd4773ba76424e3a6eb9fd8
uri: huggingface://bartowski/DeepSeek-R1-Distill-Llama-70B-GGUF/DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "deepseek-r1-qwen-2.5-32b-ablated"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/6587d8dd1b44d0e694104fbf/0dkt6EhZYwXVBxvSWXdaM.png
+ urls:
+ - https://huggingface.co/NaniDAO/deepseek-r1-qwen-2.5-32B-ablated
+ - https://huggingface.co/bartowski/deepseek-r1-qwen-2.5-32B-ablated-GGUF
+ description: |
+ DeepSeek-R1-Distill-Qwen-32B with ablation technique applied for a more helpful (and based) reasoning model.
+
+ This means it will refuse less of your valid requests for an uncensored UX. Use responsibly and use common sense.
+
+ We do not take any responsibility for how you apply this intelligence, just as we do not for how you apply your own.
+ overrides:
+ parameters:
+ model: deepseek-r1-qwen-2.5-32B-ablated-Q4_K_M.gguf
+ files:
+ - filename: deepseek-r1-qwen-2.5-32B-ablated-Q4_K_M.gguf
+ sha256: 7f33898641ebe58fe178c3517efc129f4fe37c6ca2d8b91353c4539b0c3411ec
+ uri: huggingface://bartowski/deepseek-r1-qwen-2.5-32B-ablated-GGUF/deepseek-r1-qwen-2.5-32B-ablated-Q4_K_M.gguf
- &qwen2 ## Start QWEN2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "qwen2-7b-instruct"
From 4db8f5cbced8031aa1536b5c4fb906429899477c Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sun, 26 Jan 2025 22:44:54 +0100
Subject: [PATCH 098/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`178a7eb952d211b8d4232d5e50ae1b64519172a9` (#4694)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index f6ee9a08..f960194c 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=26771a1491f3a4c3d5b99c4c267b81aca9a7dfa0
+CPPLLAMA_VERSION?=178a7eb952d211b8d4232d5e50ae1b64519172a9
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 5cf838c08d304844f78f26098956249c1d132c49 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 27 Jan 2025 09:26:00 +0100
Subject: [PATCH 099/679] chore(model gallery): add confucius-o1-14b (#4696)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index da601b35..d736ec35 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3427,6 +3427,20 @@
- filename: Dumpling-Qwen2.5-32B-Q4_K_M.gguf
sha256: c5b7d773cc614650ad3956008e30d0607df6106c28e381870a9b950bd4ee1d17
uri: huggingface://bartowski/Dumpling-Qwen2.5-32B-GGUF/Dumpling-Qwen2.5-32B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "confucius-o1-14b"
+ urls:
+ - https://huggingface.co/netease-youdao/Confucius-o1-14B
+ - https://huggingface.co/bartowski/Confucius-o1-14B-GGUF
+ description: |
+ Confucius-o1-14B is a o1-like reasoning model developed by the NetEase Youdao Team, it can be easily deployed on a single GPU without quantization. This model is based on the Qwen2.5-14B-Instruct model and adopts a two-stage learning strategy, enabling the lightweight 14B model to possess thinking abilities similar to those of o1. What sets it apart is that after generating the chain of thought, it can summarize a step-by-step problem-solving process from the chain of thought on its own. This can prevent users from getting bogged down in the complex chain of thought and allows them to easily obtain the correct problem-solving ideas and answers.
+ overrides:
+ parameters:
+ model: Confucius-o1-14B-Q4_K_M.gguf
+ files:
+ - filename: Confucius-o1-14B-Q4_K_M.gguf
+ sha256: 03182920edd8667db7d2a362ca2d25e88f4b615b383b5a55c764f4715fb22dd9
+ uri: huggingface://bartowski/Confucius-o1-14B-GGUF/Confucius-o1-14B-Q4_K_M.gguf
- &llama31 ## LLama3.1
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
From 26d790a2b6f1ee7ef276238f2475c282444d2e80 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 27 Jan 2025 09:28:29 +0100
Subject: [PATCH 100/679] chore(model gallery): add
fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1 (#4697)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index d736ec35..d1b5b822 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5509,6 +5509,20 @@
- filename: deepseek-r1-qwen-2.5-32B-ablated-Q4_K_M.gguf
sha256: 7f33898641ebe58fe178c3517efc129f4fe37c6ca2d8b91353c4539b0c3411ec
uri: huggingface://bartowski/deepseek-r1-qwen-2.5-32B-ablated-GGUF/deepseek-r1-qwen-2.5-32B-ablated-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1"
+ urls:
+ - https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
+ - https://huggingface.co/bartowski/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-v0.1-GGUF
+ description: |
+ FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
+ overrides:
+ parameters:
+ model: FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-v0.1-Q4_K_M.gguf
+ files:
+ - filename: FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-v0.1-Q4_K_M.gguf
+ sha256: d7753547046cd6e3d45a2cfbd5557aa20dd0b9f0330931d3fd5b3d4a0b468b24
+ uri: huggingface://bartowski/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-v0.1-GGUF/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-v0.1-Q4_K_M.gguf
- &qwen2 ## Start QWEN2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "qwen2-7b-instruct"
From e7cffd7afafdf46a3995019bdb8c587881796e68 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 27 Jan 2025 09:31:47 +0100
Subject: [PATCH 101/679] chore(model gallery): add
fuseo1-deepseekr1-qwen2.5-instruct-32b-preview (#4698)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index d1b5b822..5cf627f5 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5523,6 +5523,20 @@
- filename: FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-v0.1-Q4_K_M.gguf
sha256: d7753547046cd6e3d45a2cfbd5557aa20dd0b9f0330931d3fd5b3d4a0b468b24
uri: huggingface://bartowski/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-v0.1-GGUF/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-v0.1-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "fuseo1-deepseekr1-qwen2.5-instruct-32b-preview"
+ urls:
+ - https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview
+ - https://huggingface.co/bartowski/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview-GGUF
+ description: |
+ FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
+ overrides:
+ parameters:
+ model: FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview-Q4_K_M.gguf
+ files:
+ - filename: FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview-Q4_K_M.gguf
+ sha256: 3b06a004a6bb827f809a7326b30ee73f96a1a86742d8c2dd335d75874fa17aa4
+ uri: huggingface://bartowski/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview-GGUF/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview-Q4_K_M.gguf
- &qwen2 ## Start QWEN2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "qwen2-7b-instruct"
From 0f4f62cf3cdbc34c99a69a83f97a74f9913b64f2 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 27 Jan 2025 09:51:06 +0100
Subject: [PATCH 102/679] chore(model gallery): add
fuseo1-deepseekr1-qwq-32b-preview (#4699)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 5cf627f5..5e081b98 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5537,6 +5537,20 @@
- filename: FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview-Q4_K_M.gguf
sha256: 3b06a004a6bb827f809a7326b30ee73f96a1a86742d8c2dd335d75874fa17aa4
uri: huggingface://bartowski/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview-GGUF/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "fuseo1-deepseekr1-qwq-32b-preview"
+ urls:
+ - https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview
+ - https://huggingface.co/bartowski/FuseO1-DeepSeekR1-QwQ-32B-Preview-GGUF
+ description: |
+ FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
+ overrides:
+ parameters:
+ model: FuseO1-DeepSeekR1-QwQ-32B-Preview-Q4_K_M.gguf
+ files:
+ - filename: FuseO1-DeepSeekR1-QwQ-32B-Preview-Q4_K_M.gguf
+ sha256: 16f1fb6bf76bb971a7a63e1a68cddd09421f4a767b86eec55eed1e08178f78f2
+ uri: huggingface://bartowski/FuseO1-DeepSeekR1-QwQ-32B-Preview-GGUF/FuseO1-DeepSeekR1-QwQ-32B-Preview-Q4_K_M.gguf
- &qwen2 ## Start QWEN2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
name: "qwen2-7b-instruct"
From 539e94db731badf8878c23a71b00d3b02dacaf7e Mon Sep 17 00:00:00 2001
From: Maximilian Kenfenheuer
Date: Mon, 27 Jan 2025 16:53:05 +0100
Subject: [PATCH 103/679] feat: function argument parsing using named regex
(#4700)
Signed-off-by: Maximilian Kenfenheuer
---
pkg/functions/parse.go | 45 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 44 insertions(+), 1 deletion(-)
diff --git a/pkg/functions/parse.go b/pkg/functions/parse.go
index f5593690..7b8df91e 100644
--- a/pkg/functions/parse.go
+++ b/pkg/functions/parse.go
@@ -5,6 +5,7 @@ import (
"errors"
"io"
"regexp"
+ "slices"
"strings"
"github.com/mudler/LocalAI/pkg/functions/grammars"
@@ -71,6 +72,12 @@ type FunctionsConfig struct {
// JSONRegexMatch is a regex to extract the JSON object from the response
JSONRegexMatch []string `yaml:"json_regex_match"`
+ // ArgumentRegex is a named regex to extract the arguments from the response. Use ArgumentRegexKey and ArgumentRegexValue to set the names of the named regex for key and value of the arguments.
+ ArgumentRegex []string `yaml:"argument_regex"`
+ // ArgumentRegex named regex names for key and value extractions. default: key and value
+ ArgumentRegexKey string `yaml:"argument_regex_key_name"` // default: key
+ ArgumentRegexValue string `yaml:"argument_regex_value_name"` // default: value
+
// ReplaceFunctionResults allow to replace strings in the results before parsing them
ReplaceFunctionResults []ReplaceResult `yaml:"replace_function_results"`
@@ -310,7 +317,7 @@ func ParseFunctionCall(llmresult string, functionConfig FunctionsConfig) []FuncC
if functionName == "" {
return results
}
- results = append(results, FuncCallResults{Name: result[functionNameKey], Arguments: result[functionArgumentsKey]})
+ results = append(results, FuncCallResults{Name: result[functionNameKey], Arguments: ParseFunctionCallArgs(result[functionArgumentsKey], functionConfig)})
}
}
} else {
@@ -322,3 +329,39 @@ func ParseFunctionCall(llmresult string, functionConfig FunctionsConfig) []FuncC
return results
}
+
+func ParseFunctionCallArgs(functionArguments string, functionConfig FunctionsConfig) string {
+ if len(functionConfig.ArgumentRegex) > 0 {
+ // We use named regexes here to extract the function argument key value pairs and convert this to valid json.
+ // TODO: there might be responses where an object as a value is expected/required. This is currently not handled.
+ args := make(map[string]string)
+
+ agrsRegexKeyName := "key"
+ agrsRegexValueName := "value"
+
+ if functionConfig.ArgumentRegexKey != "" {
+ agrsRegexKeyName = functionConfig.ArgumentRegexKey
+ }
+ if functionConfig.ArgumentRegexValue != "" {
+ agrsRegexValueName = functionConfig.ArgumentRegexValue
+ }
+
+ for _, r := range functionConfig.ArgumentRegex {
+ var respRegex = regexp.MustCompile(r)
+ var nameRange []string = respRegex.SubexpNames()
+ var keyIndex = slices.Index(nameRange, agrsRegexKeyName)
+ var valueIndex = slices.Index(nameRange, agrsRegexValueName)
+ matches := respRegex.FindAllStringSubmatch(functionArguments, -1)
+ for _, match := range matches {
+ args[match[keyIndex]] = match[valueIndex]
+ }
+ }
+
+ jsonBytes, _ := json.Marshal(args)
+ jsonString := string(jsonBytes)
+
+ return jsonString
+ } else {
+ return functionArguments
+ }
+}
From fff35d5528a573935cad76489974d39b8cebfff3 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 27 Jan 2025 21:09:50 +0000
Subject: [PATCH 104/679] chore(deps): Bump sentence-transformers from 3.3.1 to
3.4.0 in /backend/python/transformers (#4702)
chore(deps): Bump sentence-transformers in /backend/python/transformers
Bumps [sentence-transformers](https://github.com/UKPLab/sentence-transformers) from 3.3.1 to 3.4.0.
- [Release notes](https://github.com/UKPLab/sentence-transformers/releases)
- [Commits](https://github.com/UKPLab/sentence-transformers/compare/v3.3.1...v3.4.0)
---
updated-dependencies:
- dependency-name: sentence-transformers
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
backend/python/transformers/requirements-cpu.txt | 2 +-
backend/python/transformers/requirements-cublas11.txt | 2 +-
backend/python/transformers/requirements-cublas12.txt | 2 +-
backend/python/transformers/requirements-hipblas.txt | 2 +-
backend/python/transformers/requirements-intel.txt | 2 +-
5 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/backend/python/transformers/requirements-cpu.txt b/backend/python/transformers/requirements-cpu.txt
index c88508e3..36dc973a 100644
--- a/backend/python/transformers/requirements-cpu.txt
+++ b/backend/python/transformers/requirements-cpu.txt
@@ -5,4 +5,4 @@ accelerate
transformers
bitsandbytes
outetts
-sentence-transformers==3.3.1
\ No newline at end of file
+sentence-transformers==3.4.0
\ No newline at end of file
diff --git a/backend/python/transformers/requirements-cublas11.txt b/backend/python/transformers/requirements-cublas11.txt
index 0faa9cec..a8b1c0c0 100644
--- a/backend/python/transformers/requirements-cublas11.txt
+++ b/backend/python/transformers/requirements-cublas11.txt
@@ -6,4 +6,4 @@ accelerate
transformers
bitsandbytes
outetts
-sentence-transformers==3.3.1
+sentence-transformers==3.4.0
diff --git a/backend/python/transformers/requirements-cublas12.txt b/backend/python/transformers/requirements-cublas12.txt
index 1e22312f..a54c4c88 100644
--- a/backend/python/transformers/requirements-cublas12.txt
+++ b/backend/python/transformers/requirements-cublas12.txt
@@ -5,4 +5,4 @@ numba==0.60.0
transformers
bitsandbytes
outetts
-sentence-transformers==3.3.1
+sentence-transformers==3.4.0
diff --git a/backend/python/transformers/requirements-hipblas.txt b/backend/python/transformers/requirements-hipblas.txt
index 47aa88db..73b7d85b 100644
--- a/backend/python/transformers/requirements-hipblas.txt
+++ b/backend/python/transformers/requirements-hipblas.txt
@@ -7,4 +7,4 @@ numba==0.60.0
bitsandbytes
outetts
bitsandbytes
-sentence-transformers==3.3.1
+sentence-transformers==3.4.0
diff --git a/backend/python/transformers/requirements-intel.txt b/backend/python/transformers/requirements-intel.txt
index 708b0516..5b677199 100644
--- a/backend/python/transformers/requirements-intel.txt
+++ b/backend/python/transformers/requirements-intel.txt
@@ -8,4 +8,4 @@ numba==0.60.0
intel-extension-for-transformers
bitsandbytes
outetts
-sentence-transformers==3.3.1
+sentence-transformers==3.4.0
From 03f3df9a82dd8452abc9bae93f3b7cfb3063e322 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 28 Jan 2025 09:13:00 +0100
Subject: [PATCH 105/679] chore(deps): Bump docs/themes/hugo-theme-relearn from
`8dad5ee` to `5bcb9fe` (#4704)
chore(deps): Bump docs/themes/hugo-theme-relearn
Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `8dad5ee` to `5bcb9fe`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](https://github.com/McShelby/hugo-theme-relearn/compare/8dad5ee419e5bb2a0b380aa72d7a7389af4945f6...5bcb9fe5e61d2fbe702034a24425992fd2455b0a)
---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
dependency-type: direct:production
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
docs/themes/hugo-theme-relearn | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/themes/hugo-theme-relearn b/docs/themes/hugo-theme-relearn
index 8dad5ee4..5bcb9fe5 160000
--- a/docs/themes/hugo-theme-relearn
+++ b/docs/themes/hugo-theme-relearn
@@ -1 +1 @@
-Subproject commit 8dad5ee419e5bb2a0b380aa72d7a7389af4945f6
+Subproject commit 5bcb9fe5e61d2fbe702034a24425992fd2455b0a
From 3d0fbcb4f7331d1d7abca50d9719ea7e232cbdb3 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Tue, 28 Jan 2025 09:13:43 +0100
Subject: [PATCH 106/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`a4417ddda98fd0558fb4d802253e68a933704b59` (#4705)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index f960194c..08c334a3 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=178a7eb952d211b8d4232d5e50ae1b64519172a9
+CPPLLAMA_VERSION?=a4417ddda98fd0558fb4d802253e68a933704b59
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From d9204ea3b5b0edbfb1e980fa559a7fa79ac8f1ff Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 28 Jan 2025 11:50:09 +0100
Subject: [PATCH 107/679] chore(deps): Bump dependabot/fetch-metadata from
2.2.0 to 2.3.0 (#4701)
Bumps [dependabot/fetch-metadata](https://github.com/dependabot/fetch-metadata) from 2.2.0 to 2.3.0.
- [Release notes](https://github.com/dependabot/fetch-metadata/releases)
- [Commits](https://github.com/dependabot/fetch-metadata/compare/v2.2.0...v2.3.0)
---
updated-dependencies:
- dependency-name: dependabot/fetch-metadata
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
.github/workflows/dependabot_auto.yml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.github/workflows/dependabot_auto.yml b/.github/workflows/dependabot_auto.yml
index 951e65e1..5bcd84f6 100644
--- a/.github/workflows/dependabot_auto.yml
+++ b/.github/workflows/dependabot_auto.yml
@@ -14,7 +14,7 @@ jobs:
steps:
- name: Dependabot metadata
id: metadata
- uses: dependabot/fetch-metadata@v2.2.0
+ uses: dependabot/fetch-metadata@v2.3.0
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
skip-commit-verification: true
From 91e1ff5a95d60fa3a8df250d953640f522adb251 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Tue, 28 Jan 2025 22:45:14 +0100
Subject: [PATCH 108/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`cae9fb4361138b937464524eed907328731b81f6` (#4711)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 08c334a3..6cbc7326 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=a4417ddda98fd0558fb4d802253e68a933704b59
+CPPLLAMA_VERSION?=cae9fb4361138b937464524eed907328731b81f6
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From b4b67e00bd7b705b1f6497f953ae562d0ea3af64 Mon Sep 17 00:00:00 2001
From: Maximilian Kenfenheuer
Date: Tue, 28 Jan 2025 22:58:02 +0100
Subject: [PATCH 109/679] refactor: function argument parsing using named regex
(#4708)
Signed-off-by: Maximilian Kenfenheuer
---
pkg/functions/parse.go | 61 +++++++++++++++++++++---------------------
1 file changed, 30 insertions(+), 31 deletions(-)
diff --git a/pkg/functions/parse.go b/pkg/functions/parse.go
index 7b8df91e..50cbb27b 100644
--- a/pkg/functions/parse.go
+++ b/pkg/functions/parse.go
@@ -331,37 +331,36 @@ func ParseFunctionCall(llmresult string, functionConfig FunctionsConfig) []FuncC
}
func ParseFunctionCallArgs(functionArguments string, functionConfig FunctionsConfig) string {
- if len(functionConfig.ArgumentRegex) > 0 {
- // We use named regexes here to extract the function argument key value pairs and convert this to valid json.
- // TODO: there might be responses where an object as a value is expected/required. This is currently not handled.
- args := make(map[string]string)
-
- agrsRegexKeyName := "key"
- agrsRegexValueName := "value"
-
- if functionConfig.ArgumentRegexKey != "" {
- agrsRegexKeyName = functionConfig.ArgumentRegexKey
- }
- if functionConfig.ArgumentRegexValue != "" {
- agrsRegexValueName = functionConfig.ArgumentRegexValue
- }
-
- for _, r := range functionConfig.ArgumentRegex {
- var respRegex = regexp.MustCompile(r)
- var nameRange []string = respRegex.SubexpNames()
- var keyIndex = slices.Index(nameRange, agrsRegexKeyName)
- var valueIndex = slices.Index(nameRange, agrsRegexValueName)
- matches := respRegex.FindAllStringSubmatch(functionArguments, -1)
- for _, match := range matches {
- args[match[keyIndex]] = match[valueIndex]
- }
- }
-
- jsonBytes, _ := json.Marshal(args)
- jsonString := string(jsonBytes)
-
- return jsonString
- } else {
+ if len(functionConfig.ArgumentRegex) == 0 {
return functionArguments
}
+
+ // We use named regexes here to extract the function argument key value pairs and convert this to valid json.
+ // TODO: there might be responses where an object as a value is expected/required. This is currently not handled.
+ args := make(map[string]string)
+
+ agrsRegexKeyName := "key"
+ agrsRegexValueName := "value"
+
+ if functionConfig.ArgumentRegexKey != "" {
+ agrsRegexKeyName = functionConfig.ArgumentRegexKey
+ }
+ if functionConfig.ArgumentRegexValue != "" {
+ agrsRegexValueName = functionConfig.ArgumentRegexValue
+ }
+
+ for _, r := range functionConfig.ArgumentRegex {
+ var respRegex = regexp.MustCompile(r)
+ var nameRange []string = respRegex.SubexpNames()
+ var keyIndex = slices.Index(nameRange, agrsRegexKeyName)
+ var valueIndex = slices.Index(nameRange, agrsRegexValueName)
+ matches := respRegex.FindAllStringSubmatch(functionArguments, -1)
+ for _, match := range matches {
+ args[match[keyIndex]] = match[valueIndex]
+ }
+ }
+
+ jsonBytes, _ := json.Marshal(args)
+
+ return string(jsonBytes)
}
From a37b2c765c2085c5d89cdfffda63e5ca671b4465 Mon Sep 17 00:00:00 2001
From: Maximilian Kenfenheuer
Date: Tue, 28 Jan 2025 22:58:35 +0100
Subject: [PATCH 110/679] docs: update advanced-usage.md to reflect changes in
#4700 (#4709)
Signed-off-by: Maximilian Kenfenheuer
---
docs/content/docs/advanced/advanced-usage.md | 3 +++
1 file changed, 3 insertions(+)
diff --git a/docs/content/docs/advanced/advanced-usage.md b/docs/content/docs/advanced/advanced-usage.md
index dd9894ef..62c19aba 100644
--- a/docs/content/docs/advanced/advanced-usage.md
+++ b/docs/content/docs/advanced/advanced-usage.md
@@ -148,6 +148,9 @@ function:
no_action_function_name: "" # Function name to call when no action is determined.
no_action_description_name: "" # Description name for no-action functions.
response_regex: [] # Regular expressions to match response from
+ argument_regex: [] # Named regular to extract function arguments from the response.
+ argument_regex_key_name: "key" # Name of the named regex capture to capture the key of the function arguments
+ argument_regex_value_name: "value" # Name of the named regex capture to capture the value of the function arguments
json_regex_match: [] # Regular expressions to match JSON data when in tool mode
replace_function_results: [] # Placeholder to replace function call results with arbitrary strings or patterns.
replace_llm_results: [] # Replace language model results with arbitrary strings or patterns.
From 1f4e66d63816efe9ed2c917f8c119a5289b8d01d Mon Sep 17 00:00:00 2001
From: Maximilian Kenfenheuer
Date: Wed, 29 Jan 2025 10:19:48 +0100
Subject: [PATCH 111/679] chore(model gallery): add specific message templates
for llama3.2 based models (#4707)
* chore(model gallery): add specific message templates for llama3.2 based models
Signed-off-by: Maximilian Kenfenheuer
* fix: yaml lint in llama3.2-quantized.yaml
Signed-off-by: Maximilian Kenfenheuer
* fix: yaml lint in llama3.2-quantized.yaml
Signed-off-by: Maximilian Kenfenheuer
---------
Signed-off-by: Maximilian Kenfenheuer
---
gallery/index.yaml | 2 +-
gallery/llama3.2-quantized.yaml | 55 +++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+), 1 deletion(-)
create mode 100644 gallery/llama3.2-quantized.yaml
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 5e081b98..1716f2b1 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -839,7 +839,7 @@
sha256: bac8e8c1d1d9d53cbdb148b8ff9ad378ddb392429207099e85b5aae3a43bff3d
uri: huggingface://cstr/salamandra-7b-instruct-GGUF/salamandra-7b-instruct.Q4_K_M-f32.gguf
- &llama32 ## llama3.2
- url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
+ url: "github:mudler/LocalAI/gallery/llama3.2-quantized.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
license: llama3.2
description: |
diff --git a/gallery/llama3.2-quantized.yaml b/gallery/llama3.2-quantized.yaml
new file mode 100644
index 00000000..7e1d2630
--- /dev/null
+++ b/gallery/llama3.2-quantized.yaml
@@ -0,0 +1,55 @@
+---
+name: "llama3.2-quantized"
+
+config_file: |
+ mmap: true
+ function:
+ disable_no_action: true
+ grammar:
+ disable: true
+ response_regex:
+ - \[(?P\w+)\((?P.*)\)\]
+ argument_regex:
+ - (?P[^ '\(=,]+)[='"]+(?P[^=,"']+)['"]?
+ template:
+ chat: |
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
+ You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>
+ {{.Input }}
+ <|start_header_id|>assistant<|end_header_id|>
+ chat_message: |
+ <|start_header_id|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}<|end_header_id|>
+ {{ if .FunctionCall -}}
+ {{ else if eq .RoleName "tool" -}}
+ The Function was executed and the response was:
+ {{ end -}}
+ {{ if .Content -}}
+ {{.Content -}}
+ {{ else if .FunctionCall -}}
+ {{ range .FunctionCall }}
+ [{{.FunctionCall.Name}}({{.FunctionCall.Arguments}})]
+ {{ end }}
+ {{ end -}}
+ <|eot_id|>
+ completion: |
+ {{.Input}}
+ function: |
+ <|start_header_id|>system<|end_header_id|>
+ You are an expert in composing functions. You are given a question and a set of possible functions.
+ Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
+ If none of the functions can be used, point it out. If the given question lacks the parameters required by the function, also point it out. You should only return the function call in tools call sections.
+ If you decide to invoke any of the function(s), you MUST put it in the format as follows:
+ [func_name1(params_name1=params_value1,params_name2=params_value2,...),func_name2(params_name1=params_value1,params_name2=params_value2,...)]
+ You SHOULD NOT include any other text in the response.
+ Here is a list of functions in JSON format that you can invoke.
+ {{toJson .Functions}}
+ <|eot_id|><|start_header_id|>user<|end_header_id|>
+ {{.Input}}
+ <|eot_id|><|start_header_id|>assistant<|end_header_id|>
+ context_size: 8192
+ f16: true
+ stopwords:
+ - <|im_end|>
+ -
+ - "<|eot_id|>"
+ - <|end_of_text|>
From 7f62b418a4c605257183c3bbf1f7b98f0904fe5f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 29 Jan 2025 15:16:07 +0100
Subject: [PATCH 112/679] chore(docs): add documentation for l4t images
Signed-off-by: Ettore Di Giacinto
---
.../docs/getting-started/container-images.md | 14 +++++++++++++-
docs/content/docs/reference/nvidia-l4t.md | 10 ++++++++--
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/docs/content/docs/getting-started/container-images.md b/docs/content/docs/getting-started/container-images.md
index 64f6dbc9..a6a955ad 100644
--- a/docs/content/docs/getting-started/container-images.md
+++ b/docs/content/docs/getting-started/container-images.md
@@ -154,7 +154,7 @@ Images are available with and without python dependencies. Note that images with
Images with `core` in the tag are smaller and do not contain any python dependencies.
-{{< tabs tabTotal="7" >}}
+{{< tabs tabTotal="8" >}}
{{% tab tabName="Vanilla / CPU Images" %}}
| Description | Quay | Docker Hub |
@@ -236,6 +236,18 @@ Images with `core` in the tag are smaller and do not contain any python dependen
| Versioned image including FFMpeg, no python | `quay.io/go-skynet/local-ai:{{< version >}}-vulkan-fmpeg-core` | `localai/localai:{{< version >}}-vulkan-fmpeg-core` |
{{% /tab %}}
+{{% tab tabName="Nvidia Linux for tegra" %}}
+
+These images are compatible with Nvidia ARM64 devices, such as the Jetson Nano, Jetson Xavier NX, and Jetson AGX Xavier. For more information, see the [Nvidia L4T guide]({{%relref "docs/reference/nvidia-l4t" %}}).
+
+| Description | Quay | Docker Hub |
+| --- | --- |-------------------------------------------------------------|
+| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core` | `localai/localai:master-nvidia-l4t-arm64-core` |
+| Latest tag | `quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-core` | `localai/localai:latest-nvidia-l4t-arm64-core` |
+| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-nvidia-l4t-arm64-core` | `localai/localai:{{< version >}}-nvidia-l4t-arm64-core` |
+
+{{% /tab %}}
+
{{< /tabs >}}
## See Also
diff --git a/docs/content/docs/reference/nvidia-l4t.md b/docs/content/docs/reference/nvidia-l4t.md
index 028ee531..ce0fd5e9 100644
--- a/docs/content/docs/reference/nvidia-l4t.md
+++ b/docs/content/docs/reference/nvidia-l4t.md
@@ -21,7 +21,13 @@ git clone https://github.com/mudler/LocalAI
cd LocalAI
-docker build --build-arg SKIP_DRIVERS=true --build-arg BUILD_TYPE=cublas --build-arg BASE_IMAGE=nvcr.io/nvidia/l4t-jetpack:r36.4.0 --build-arg IMAGE_TYPE=core -t localai-orin .
+docker build --build-arg SKIP_DRIVERS=true --build-arg BUILD_TYPE=cublas --build-arg BASE_IMAGE=nvcr.io/nvidia/l4t-jetpack:r36.4.0 --build-arg IMAGE_TYPE=core -t quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core .
+```
+
+Otherwise images are available on quay.io and dockerhub:
+
+```bash
+docker pull quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core
```
## Usage
@@ -29,7 +35,7 @@ docker build --build-arg SKIP_DRIVERS=true --build-arg BUILD_TYPE=cublas --build
Run the LocalAI container on Nvidia ARM64 devices using the following command, where `/data/models` is the directory containing the models:
```bash
-docker run -e DEBUG=true -p 8080:8080 -v /data/models:/build/models -ti --restart=always --name local-ai --runtime nvidia --gpus all localai-orin
+docker run -e DEBUG=true -p 8080:8080 -v /data/models:/build/models -ti --restart=always --name local-ai --runtime nvidia --gpus all quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core
```
Note: `/data/models` is the directory containing the models. You can replace it with the directory containing your models.
From 1656e1a88e3f5fad247a47d4d9e5c54f2d606550 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Wed, 29 Jan 2025 22:45:38 +0100
Subject: [PATCH 113/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`eb7cf15a808d4d7a71eef89cc6a9b96fe82989dc` (#4717)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 6cbc7326..20ef7199 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=cae9fb4361138b937464524eed907328731b81f6
+CPPLLAMA_VERSION?=eb7cf15a808d4d7a71eef89cc6a9b96fe82989dc
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 72e52c4f6a9fb29bfa2d85006245fc3e05ae8082 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 30 Jan 2025 00:03:01 +0100
Subject: [PATCH 114/679] chore: drop embedded models (#4715)
Since the remote gallery was introduced this is now completely
superseded by it. In order to keep the code clean and remove redudant
parts let's simplify the usage.
Signed-off-by: Ettore Di Giacinto
---
Makefile | 2 +-
core/application/startup.go | 2 +-
core/cli/models.go | 2 +-
core/cli/run.go | 2 -
core/config/application_config.go | 8 --
core/services/gallery.go | 2 +-
.../content/docs/advanced/run-other-models.md | 126 ------------------
.../docs/getting-started/container-images.md | 2 +-
embedded/embedded.go | 72 ----------
embedded/model_library.yaml | 9 --
embedded/models/all-minilm-l6-v2.yaml | 13 --
embedded/models/animagine-xl.yaml | 17 ---
embedded/models/bakllava.yaml | 40 ------
embedded/models/bark.yaml | 8 --
embedded/models/cerbero.yaml | 24 ----
embedded/models/codellama-7b-gguf.yaml | 20 ---
embedded/models/codellama-7b.yaml | 14 --
embedded/models/coqui.yaml | 9 --
embedded/models/dolphin-2.5-mixtral-8x7b.yaml | 31 -----
embedded/models/hermes-2-pro-mistral.yaml | 59 --------
embedded/models/llama3-instruct.yaml | 48 -------
embedded/models/llava-1.5.yaml | 33 -----
embedded/models/llava-1.6-mistral.yaml | 33 -----
embedded/models/llava-1.6-vicuna.yaml | 37 -----
embedded/models/llava.yaml | 40 ------
embedded/models/mamba-bagel.yaml | 21 ---
embedded/models/mamba-chat.yaml | 28 ----
embedded/models/mistral-openorca.yaml | 32 -----
embedded/models/mixtral-instruct.yaml | 25 ----
embedded/models/phi-2-chat.yaml | 25 ----
embedded/models/phi-2-orange.yaml | 30 -----
embedded/models/rhasspy-voice-en-us-amy.yaml | 13 --
embedded/models/tinyllama-chat.yaml | 29 ----
embedded/models/transformers-tinyllama.yaml | 31 -----
embedded/models/vall-e-x.yaml | 8 --
embedded/models/whisper-base.yaml | 18 ---
pkg/startup/model_preload.go | 28 +---
pkg/startup/model_preload_test.go | 53 +-------
.../webui_static.yaml => webui_static.yaml | 0
39 files changed, 8 insertions(+), 986 deletions(-)
delete mode 100644 docs/content/docs/advanced/run-other-models.md
delete mode 100644 embedded/embedded.go
delete mode 100644 embedded/model_library.yaml
delete mode 100644 embedded/models/all-minilm-l6-v2.yaml
delete mode 100644 embedded/models/animagine-xl.yaml
delete mode 100644 embedded/models/bakllava.yaml
delete mode 100644 embedded/models/bark.yaml
delete mode 100644 embedded/models/cerbero.yaml
delete mode 100644 embedded/models/codellama-7b-gguf.yaml
delete mode 100644 embedded/models/codellama-7b.yaml
delete mode 100644 embedded/models/coqui.yaml
delete mode 100644 embedded/models/dolphin-2.5-mixtral-8x7b.yaml
delete mode 100644 embedded/models/hermes-2-pro-mistral.yaml
delete mode 100644 embedded/models/llama3-instruct.yaml
delete mode 100644 embedded/models/llava-1.5.yaml
delete mode 100644 embedded/models/llava-1.6-mistral.yaml
delete mode 100644 embedded/models/llava-1.6-vicuna.yaml
delete mode 100644 embedded/models/llava.yaml
delete mode 100644 embedded/models/mamba-bagel.yaml
delete mode 100644 embedded/models/mamba-chat.yaml
delete mode 100644 embedded/models/mistral-openorca.yaml
delete mode 100644 embedded/models/mixtral-instruct.yaml
delete mode 100644 embedded/models/phi-2-chat.yaml
delete mode 100644 embedded/models/phi-2-orange.yaml
delete mode 100644 embedded/models/rhasspy-voice-en-us-amy.yaml
delete mode 100644 embedded/models/tinyllama-chat.yaml
delete mode 100644 embedded/models/transformers-tinyllama.yaml
delete mode 100644 embedded/models/vall-e-x.yaml
delete mode 100644 embedded/models/whisper-base.yaml
rename embedded/webui_static.yaml => webui_static.yaml (100%)
diff --git a/Makefile b/Makefile
index 20ef7199..5b903d7d 100644
--- a/Makefile
+++ b/Makefile
@@ -861,7 +861,7 @@ swagger:
.PHONY: gen-assets
gen-assets:
- $(GOCMD) run core/dependencies_manager/manager.go embedded/webui_static.yaml core/http/static/assets
+ $(GOCMD) run core/dependencies_manager/manager.go webui_static.yaml core/http/static/assets
## Documentation
docs/layouts/_default:
diff --git a/core/application/startup.go b/core/application/startup.go
index cd52d37a..fffcd8bb 100644
--- a/core/application/startup.go
+++ b/core/application/startup.go
@@ -62,7 +62,7 @@ func New(opts ...config.AppOption) (*Application, error) {
}
}
- if err := pkgStartup.InstallModels(options.Galleries, options.ModelLibraryURL, options.ModelPath, options.EnforcePredownloadScans, nil, options.ModelsURL...); err != nil {
+ if err := pkgStartup.InstallModels(options.Galleries, options.ModelPath, options.EnforcePredownloadScans, nil, options.ModelsURL...); err != nil {
log.Error().Err(err).Msg("error installing models")
}
diff --git a/core/cli/models.go b/core/cli/models.go
index 56d13fc7..28b2944f 100644
--- a/core/cli/models.go
+++ b/core/cli/models.go
@@ -100,7 +100,7 @@ func (mi *ModelsInstall) Run(ctx *cliContext.Context) error {
log.Info().Str("model", modelName).Str("license", model.License).Msg("installing model")
}
- err = startup.InstallModels(galleries, "", mi.ModelsPath, !mi.DisablePredownloadScan, progressCallback, modelName)
+ err = startup.InstallModels(galleries, mi.ModelsPath, !mi.DisablePredownloadScan, progressCallback, modelName)
if err != nil {
return err
}
diff --git a/core/cli/run.go b/core/cli/run.go
index 279ff94b..3162ef14 100644
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -32,7 +32,6 @@ type RunCMD struct {
Galleries string `env:"LOCALAI_GALLERIES,GALLERIES" help:"JSON list of galleries" group:"models" default:"${galleries}"`
AutoloadGalleries bool `env:"LOCALAI_AUTOLOAD_GALLERIES,AUTOLOAD_GALLERIES" group:"models"`
- RemoteLibrary string `env:"LOCALAI_REMOTE_LIBRARY,REMOTE_LIBRARY" default:"${remoteLibraryURL}" help:"A LocalAI remote library URL" group:"models"`
PreloadModels string `env:"LOCALAI_PRELOAD_MODELS,PRELOAD_MODELS" help:"A List of models to apply in JSON at start" group:"models"`
Models []string `env:"LOCALAI_MODELS,MODELS" help:"A List of model configuration URLs to load" group:"models"`
PreloadModelsConfig string `env:"LOCALAI_PRELOAD_MODELS_CONFIG,PRELOAD_MODELS_CONFIG" help:"A List of models to apply at startup. Path to a YAML config file" group:"models"`
@@ -90,7 +89,6 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
config.WithDynamicConfigDirPollInterval(r.LocalaiConfigDirPollInterval),
config.WithF16(r.F16),
config.WithStringGalleries(r.Galleries),
- config.WithModelLibraryURL(r.RemoteLibrary),
config.WithCors(r.CORS),
config.WithCorsAllowOrigins(r.CORSAllowOrigins),
config.WithCsrf(r.CSRF),
diff --git a/core/config/application_config.go b/core/config/application_config.go
index 1ffcb297..2cc9b01b 100644
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@@ -44,8 +44,6 @@ type ApplicationConfig struct {
DisableGalleryEndpoint bool
LoadToMemory []string
- ModelLibraryURL string
-
Galleries []Gallery
BackendAssets embed.FS
@@ -126,12 +124,6 @@ func WithP2PToken(s string) AppOption {
}
}
-func WithModelLibraryURL(url string) AppOption {
- return func(o *ApplicationConfig) {
- o.ModelLibraryURL = url
- }
-}
-
func WithLibPath(path string) AppOption {
return func(o *ApplicationConfig) {
o.LibPath = path
diff --git a/core/services/gallery.go b/core/services/gallery.go
index 45bebd4f..f499d381 100644
--- a/core/services/gallery.go
+++ b/core/services/gallery.go
@@ -129,7 +129,7 @@ func (g *GalleryService) Start(c context.Context, cl *config.BackendConfigLoader
if op.GalleryModelName != "" {
err = gallery.InstallModelFromGallery(op.Galleries, op.GalleryModelName, g.appConfig.ModelPath, op.Req, progressCallback, g.appConfig.EnforcePredownloadScans)
} else if op.ConfigURL != "" {
- err = startup.InstallModels(op.Galleries, op.ConfigURL, g.appConfig.ModelPath, g.appConfig.EnforcePredownloadScans, progressCallback, op.ConfigURL)
+ err = startup.InstallModels(op.Galleries, g.appConfig.ModelPath, g.appConfig.EnforcePredownloadScans, progressCallback, op.ConfigURL)
if err != nil {
updateError(err)
continue
diff --git a/docs/content/docs/advanced/run-other-models.md b/docs/content/docs/advanced/run-other-models.md
deleted file mode 100644
index f9bdc22d..00000000
--- a/docs/content/docs/advanced/run-other-models.md
+++ /dev/null
@@ -1,126 +0,0 @@
-+++
-disableToc = false
-title = "Run other Models"
-weight = 23
-icon = "rocket_launch"
-
-+++
-
-## Running other models
-
-> _Do you have already a model file? Skip to [Run models manually]({{%relref "docs/getting-started/models" %}})_.
-
-To load models into LocalAI, you can either [use models manually]({{%relref "docs/getting-started/models" %}}) or configure LocalAI to pull the models from external sources, like Huggingface and configure the model.
-
-To do that, you can point LocalAI to an URL to a YAML configuration file - however - LocalAI does also have some popular model configuration embedded in the binary as well. Below you can find a list of the models configuration that LocalAI has pre-built, see [Model customization]({{%relref "docs/getting-started/customize-model" %}}) on how to configure models from URLs.
-
-There are different categories of models: [LLMs]({{%relref "docs/features/text-generation" %}}), [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) , [Embeddings]({{%relref "docs/features/embeddings" %}}), [Audio to Text]({{%relref "docs/features/audio-to-text" %}}), and [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) depending on the backend being used and the model architecture.
-
-{{% alert icon="š”" %}}
-
-To customize the models, see [Model customization]({{%relref "docs/getting-started/customize-model" %}}). For more model configurations, visit the [Examples Section](https://github.com/mudler/LocalAI-examples/tree/main/configurations) and the configurations for the models below is available [here](https://github.com/mudler/LocalAI/tree/master/embedded/models).
-{{% /alert %}}
-
-{{< tabs tabTotal="3" >}}
-{{% tab tabName="CPU-only" %}}
-
-> š”Don't need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies
-
-| Model | Category | Docker command |
-| --- | --- | --- |
-| [phi-2](https://huggingface.co/microsoft/phi-2) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core phi-2``` |
-| š [bakllava](https://github.com/SkunkworksAI/BakLLaVA) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core bakllava``` |
-| š [llava-1.5](https://llava-vl.github.io/) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava-1.5``` |
-| š [llava-1.6-mistral](https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava-1.6-mistral``` |
-| š [llava-1.6-vicuna](https://huggingface.co/cmp-nct/llava-1.6-gguf) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava-1.6-vicuna``` |
-| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mistral-openorca``` |
-| [bert-cpp](https://github.com/skeskinen/bert.cpp) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core bert-cpp``` |
-| [all-minilm-l6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg all-minilm-l6-v2``` |
-| whisper-base | [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core whisper-base``` |
-| rhasspy-voice-en-us-amy | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core rhasspy-voice-en-us-amy``` |
-| šø [coqui](https://github.com/coqui-ai/TTS) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg coqui``` |
-| š¶ [bark](https://github.com/suno-ai/bark) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg bark``` |
-| š [vall-e-x](https://github.com/Plachtaa/VALL-E-X) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg vall-e-x``` |
-| mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mixtral-instruct``` |
-| [tinyllama-chat](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF) [original model](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.3) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core tinyllama-chat``` |
-| [dolphin-2.5-mixtral-8x7b](https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core dolphin-2.5-mixtral-8x7b``` |
-| š [mamba](https://github.com/state-spaces/mamba) | [LLM]({{%relref "docs/features/text-generation" %}}) | GPU-only |
-| animagine-xl | [Text to Image]({{%relref "docs/features/image-generation" %}}) | GPU-only |
-| transformers-tinyllama | [LLM]({{%relref "docs/features/text-generation" %}}) | GPU-only |
-| [codellama-7b](https://huggingface.co/codellama/CodeLlama-7b-hf) (with transformers) | [LLM]({{%relref "docs/features/text-generation" %}}) | GPU-only |
-| [codellama-7b-gguf](https://huggingface.co/TheBloke/CodeLlama-7B-GGUF) (with llama.cpp) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core codellama-7b-gguf``` |
-| [hermes-2-pro-mistral](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core hermes-2-pro-mistral``` |
-{{% /tab %}}
-
-{{% tab tabName="GPU (CUDA 11)" %}}
-
-
-> To know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version` see also [GPU acceleration]({{%relref "docs/features/gpu-acceleration" %}}).
-
-| Model | Category | Docker command |
-| --- | --- | --- |
-| [phi-2](https://huggingface.co/microsoft/phi-2) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core phi-2``` |
-| š [bakllava](https://github.com/SkunkworksAI/BakLLaVA) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core bakllava``` |
-| š [llava-1.5](https://llava-vl.github.io/) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda11-core llava-1.5``` |
-| š [llava-1.6-mistral](https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda11-core llava-1.6-mistral``` |
-| š [llava-1.6-vicuna](https://huggingface.co/cmp-nct/llava-1.6-gguf) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda11-core llava-1.6-vicuna``` |
-| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mistral-openorca``` |
-| [bert-cpp](https://github.com/skeskinen/bert.cpp) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core bert-cpp``` |
-| [all-minilm-l6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 all-minilm-l6-v2``` |
-| whisper-base | [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core whisper-base``` |
-| rhasspy-voice-en-us-amy | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core rhasspy-voice-en-us-amy``` |
-| šø [coqui](https://github.com/coqui-ai/TTS) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 coqui``` |
-| š¶ [bark](https://github.com/suno-ai/bark) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 bark``` |
-| š [vall-e-x](https://github.com/Plachtaa/VALL-E-X) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 vall-e-x``` |
-| mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mixtral-instruct``` |
-| [tinyllama-chat](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF) [original model](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.3) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core tinyllama-chat``` |
-| [dolphin-2.5-mixtral-8x7b](https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core dolphin-2.5-mixtral-8x7b``` |
-| š [mamba](https://github.com/state-spaces/mamba) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 mamba-chat``` |
-| animagine-xl | [Text to Image]({{%relref "docs/features/image-generation" %}}) | ```docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:{{< version >}}-cublas-cuda11 animagine-xl``` |
-| transformers-tinyllama | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 transformers-tinyllama``` |
-| [codellama-7b](https://huggingface.co/codellama/CodeLlama-7b-hf) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 codellama-7b``` |
-| [codellama-7b-gguf](https://huggingface.co/TheBloke/CodeLlama-7B-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core codellama-7b-gguf``` |
-| [hermes-2-pro-mistral](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core hermes-2-pro-mistral``` |
-{{% /tab %}}
-
-
-{{% tab tabName="GPU (CUDA 12)" %}}
-
-> To know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version` see also [GPU acceleration]({{%relref "docs/features/gpu-acceleration" %}}).
-
-| Model | Category | Docker command |
-| --- | --- | --- |
-| [phi-2](https://huggingface.co/microsoft/phi-2) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core phi-2``` |
-| š [bakllava](https://github.com/SkunkworksAI/BakLLaVA) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core bakllava``` |
-| š [llava-1.5](https://llava-vl.github.io/) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda12-core llava-1.5``` |
-| š [llava-1.6-mistral](https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda12-core llava-1.6-mistral``` |
-| š [llava-1.6-vicuna](https://huggingface.co/cmp-nct/llava-1.6-gguf) | [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda12-core llava-1.6-vicuna``` |
-| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mistral-openorca``` |
-| [bert-cpp](https://github.com/skeskinen/bert.cpp) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core bert-cpp``` |
-| [all-minilm-l6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | [Embeddings]({{%relref "docs/features/embeddings" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 all-minilm-l6-v2``` |
-| whisper-base | [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core whisper-base``` |
-| rhasspy-voice-en-us-amy | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core rhasspy-voice-en-us-amy``` |
-| šø [coqui](https://github.com/coqui-ai/TTS) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 coqui``` |
-| š¶ [bark](https://github.com/suno-ai/bark) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 bark``` |
-| š [vall-e-x](https://github.com/Plachtaa/VALL-E-X) | [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 vall-e-x``` |
-| mixtral-instruct Mixtral-8x7B-Instruct-v0.1 | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mixtral-instruct``` |
-| [tinyllama-chat](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF) [original model](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.3) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core tinyllama-chat``` |
-| [dolphin-2.5-mixtral-8x7b](https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core dolphin-2.5-mixtral-8x7b``` |
-| š [mamba](https://github.com/state-spaces/mamba) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 mamba-chat``` |
-| animagine-xl | [Text to Image]({{%relref "docs/features/image-generation" %}}) | ```docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:{{< version >}}-cublas-cuda12 animagine-xl``` |
-| transformers-tinyllama | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 transformers-tinyllama``` |
-| [codellama-7b](https://huggingface.co/codellama/CodeLlama-7b-hf) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 codellama-7b``` |
-| [codellama-7b-gguf](https://huggingface.co/TheBloke/CodeLlama-7B-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core codellama-7b-gguf``` |
-| [hermes-2-pro-mistral](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF) | [LLM]({{%relref "docs/features/text-generation" %}}) | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core hermes-2-pro-mistral``` |
-{{% /tab %}}
-
-{{< /tabs >}}
-
-{{% alert icon="š”" %}}
-**Tip** You can actually specify multiple models to start an instance with the models loaded, for example to have both llava and phi-2 configured:
-
-```bash
-docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava phi-2
-```
-
-{{% /alert %}}
diff --git a/docs/content/docs/getting-started/container-images.md b/docs/content/docs/getting-started/container-images.md
index a6a955ad..d1930805 100644
--- a/docs/content/docs/getting-started/container-images.md
+++ b/docs/content/docs/getting-started/container-images.md
@@ -143,7 +143,7 @@ The AIO Images are inheriting the same environment variables as the base images
| Variable | Default | Description |
| ---------------------| ------- | ----------- |
| `PROFILE` | Auto-detected | The size of the model to use. Available: `cpu`, `gpu-8g` |
-| `MODELS` | Auto-detected | A list of models YAML Configuration file URI/URL (see also [running models]({{%relref "docs/advanced/run-other-models" %}})) |
+| `MODELS` | Auto-detected | A list of models YAML Configuration file URI/URL (see also [running models]({{%relref "docs/getting-started/models" %}})) |
## Standard container images
diff --git a/embedded/embedded.go b/embedded/embedded.go
deleted file mode 100644
index 3a4ea262..00000000
--- a/embedded/embedded.go
+++ /dev/null
@@ -1,72 +0,0 @@
-package embedded
-
-import (
- "embed"
- "fmt"
- "slices"
- "strings"
-
- "github.com/mudler/LocalAI/pkg/downloader"
- "github.com/rs/zerolog/log"
-
- "github.com/mudler/LocalAI/pkg/assets"
- "gopkg.in/yaml.v3"
-)
-
-var modelShorteners map[string]string
-
-//go:embed model_library.yaml
-var modelLibrary []byte
-
-//go:embed models/*
-var embeddedModels embed.FS
-
-func ModelShortURL(s string) string {
- if _, ok := modelShorteners[s]; ok {
- s = modelShorteners[s]
- }
-
- return s
-}
-
-func init() {
- err := yaml.Unmarshal(modelLibrary, &modelShorteners)
- if err != nil {
- log.Error().Err(err).Msg("error while unmarshalling embedded modelLibrary")
- }
-}
-
-func GetRemoteLibraryShorteners(url string, basePath string) (map[string]string, error) {
- remoteLibrary := map[string]string{}
- uri := downloader.URI(url)
- err := uri.DownloadWithCallback(basePath, func(_ string, i []byte) error {
- return yaml.Unmarshal(i, &remoteLibrary)
- })
- if err != nil {
- return nil, fmt.Errorf("error downloading remote library: %s", err.Error())
- }
-
- return remoteLibrary, err
-}
-
-// ExistsInModelsLibrary checks if a model exists in the embedded models library
-func ExistsInModelsLibrary(s string) bool {
- f := fmt.Sprintf("%s.yaml", s)
-
- a := []string{}
-
- for _, j := range assets.ListFiles(embeddedModels) {
- a = append(a, strings.TrimPrefix(j, "models/"))
- }
-
- return slices.Contains(a, f)
-}
-
-// ResolveContent returns the content in the embedded model library
-func ResolveContent(s string) ([]byte, error) {
- if ExistsInModelsLibrary(s) {
- return embeddedModels.ReadFile(fmt.Sprintf("models/%s.yaml", s))
- }
-
- return nil, fmt.Errorf("cannot find model %s", s)
-}
diff --git a/embedded/model_library.yaml b/embedded/model_library.yaml
deleted file mode 100644
index 281941a5..00000000
--- a/embedded/model_library.yaml
+++ /dev/null
@@ -1,9 +0,0 @@
-###
-###
-### This file contains the list of models that are available in the library
-### The URLs are automatically expanded when local-ai is being called with the key as argument
-###
-### For models with an entire YAML file to be embededd, put the file inside the `models`
-### directory, it will be automatically available with the file name as key (without the .yaml extension)
-
-phi-2: "github://mudler/LocalAI-examples/configurations/phi-2.yaml@main"
diff --git a/embedded/models/all-minilm-l6-v2.yaml b/embedded/models/all-minilm-l6-v2.yaml
deleted file mode 100644
index 512d63a4..00000000
--- a/embedded/models/all-minilm-l6-v2.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-name: all-minilm-l6-v2
-backend: sentencetransformers
-embeddings: true
-parameters:
- model: all-MiniLM-L6-v2
-
-usage: |
- You can test this model with curl like this:
-
- curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
- "input": "Your text string goes here",
- "model": "all-minilm-l6-v2"
- }'
\ No newline at end of file
diff --git a/embedded/models/animagine-xl.yaml b/embedded/models/animagine-xl.yaml
deleted file mode 100644
index d492c080..00000000
--- a/embedded/models/animagine-xl.yaml
+++ /dev/null
@@ -1,17 +0,0 @@
-name: animagine-xl
-parameters:
- model: Linaqruf/animagine-xl
-backend: diffusers
-f16: true
-diffusers:
- scheduler_type: euler_a
-
-usage: |
- curl http://localhost:8080/v1/images/generations \
- -H "Content-Type: application/json" \
- -d '{
- "prompt": "|",
- "model": "animagine-xl",
- "step": 51,
- "size": "1024x1024"
- }'
\ No newline at end of file
diff --git a/embedded/models/bakllava.yaml b/embedded/models/bakllava.yaml
deleted file mode 100644
index 52fd9466..00000000
--- a/embedded/models/bakllava.yaml
+++ /dev/null
@@ -1,40 +0,0 @@
-backend: llama-cpp
-context_size: 4096
-f16: true
-
-gpu_layers: 90
-mmap: true
-name: bakllava
-
-roles:
- user: "USER:"
- assistant: "ASSISTANT:"
- system: "SYSTEM:"
-
-mmproj: bakllava-mmproj.gguf
-parameters:
- model: bakllava.gguf
- temperature: 0.2
- top_k: 40
- top_p: 0.95
- seed: -1
-mirostat: 2
-mirostat_eta: 1.0
-mirostat_tau: 1.0
-
-template:
- chat: |
- A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
- {{.Input}}
- ASSISTANT:
-
-download_files:
-- filename: bakllava.gguf
- uri: huggingface://mys/ggml_bakllava-1/ggml-model-q4_k.gguf
-- filename: bakllava-mmproj.gguf
- uri: huggingface://mys/ggml_bakllava-1/mmproj-model-f16.gguf
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "bakllava",
- "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
diff --git a/embedded/models/bark.yaml b/embedded/models/bark.yaml
deleted file mode 100644
index da1b1db4..00000000
--- a/embedded/models/bark.yaml
+++ /dev/null
@@ -1,8 +0,0 @@
-usage: |
- bark works without any configuration, to test it, you can run the following curl command:
-
- curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
- "backend": "bark",
- "input":"Hello, this is a test!"
- }' | aplay
-# TODO: This is a placeholder until we manage to pre-load HF/Transformers models
\ No newline at end of file
diff --git a/embedded/models/cerbero.yaml b/embedded/models/cerbero.yaml
deleted file mode 100644
index 8ace4e35..00000000
--- a/embedded/models/cerbero.yaml
+++ /dev/null
@@ -1,24 +0,0 @@
-backend: llama
-context_size: 8192
-f16: false
-gpu_layers: 90
-name: cerbero
-mmap: false
-parameters:
- model: huggingface://galatolo/cerbero-7b-gguf/ggml-model-Q8_0.gguf
- top_k: 80
- temperature: 0.2
- top_p: 0.7
-template:
- completion: "{{.Input}}"
- chat: "Questa ĆØ una conversazione tra un umano ed un assistente AI.\n{{.Input}}\n[|Assistente|] "
-roles:
- user: "[|Umano|] "
- system: "[|Umano|] "
- assistant: "[|Assistente|] "
-
-stopwords:
-- "[|Umano|]"
-
-trimsuffix:
-- "\n"
\ No newline at end of file
diff --git a/embedded/models/codellama-7b-gguf.yaml b/embedded/models/codellama-7b-gguf.yaml
deleted file mode 100644
index 413c838b..00000000
--- a/embedded/models/codellama-7b-gguf.yaml
+++ /dev/null
@@ -1,20 +0,0 @@
-name: codellama-7b-gguf
-backend: transformers
-parameters:
- model: huggingface://TheBloke/CodeLlama-7B-GGUF/codellama-7b.Q4_K_M.gguf
- temperature: 0.5
- top_k: 40
- seed: -1
- top_p: 0.95
-mirostat: 2
-mirostat_eta: 1.0
-mirostat_tau: 1.0
-
-context_size: 4096
-f16: true
-gpu_layers: 90
-usage: |
- curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
- "model": "codellama-7b-gguf",
- "prompt": "import socket\n\ndef ping_exponential_backoff(host: str):"
- }'
\ No newline at end of file
diff --git a/embedded/models/codellama-7b.yaml b/embedded/models/codellama-7b.yaml
deleted file mode 100644
index d9b5c62c..00000000
--- a/embedded/models/codellama-7b.yaml
+++ /dev/null
@@ -1,14 +0,0 @@
-name: codellama-7b
-backend: transformers
-type: AutoModelForCausalLM
-parameters:
- model: codellama/CodeLlama-7b-hf
- temperature: 0.2
- top_k: 40
- top_p: 0.95
-
-usage: |
- curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
- "model": "codellama-7b",
- "prompt": "import socket\n\ndef ping_exponential_backoff(host: str):"
- }'
diff --git a/embedded/models/coqui.yaml b/embedded/models/coqui.yaml
deleted file mode 100644
index 5d67f241..00000000
--- a/embedded/models/coqui.yaml
+++ /dev/null
@@ -1,9 +0,0 @@
-usage: |
- coqui works without any configuration, to test it, you can run the following curl command:
-
- curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
- "backend": "coqui",
- "model": "tts_models/en/ljspeech/glow-tts",
- "input":"Hello, this is a test!"
- }'
-# TODO: This is a placeholder until we manage to pre-load HF/Transformers models
\ No newline at end of file
diff --git a/embedded/models/dolphin-2.5-mixtral-8x7b.yaml b/embedded/models/dolphin-2.5-mixtral-8x7b.yaml
deleted file mode 100644
index 12ee1efc..00000000
--- a/embedded/models/dolphin-2.5-mixtral-8x7b.yaml
+++ /dev/null
@@ -1,31 +0,0 @@
-name: dolphin-mixtral-8x7b
-mmap: true
-parameters:
- model: huggingface://TheBloke/dolphin-2.5-mixtral-8x7b-GGUF/dolphin-2.5-mixtral-8x7b.Q2_K.gguf
- temperature: 0.5
- top_k: 40
- top_p: 0.95
- seed: -1
-mirostat: 2
-mirostat_eta: 1.0
-mirostat_tau: 1.0
-template:
- chat_message: |
- <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
- {{if .Content}}{{.Content}}{{end}}<|im_end|>
- chat: |
- {{.Input}}
- <|im_start|>assistant
- completion: |
- {{.Input}}
-context_size: 4096
-f16: true
-stopwords:
-- <|im_end|>
-gpu_layers: 90
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "dolphin-mixtral-8x7b",
- "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
- }'
\ No newline at end of file
diff --git a/embedded/models/hermes-2-pro-mistral.yaml b/embedded/models/hermes-2-pro-mistral.yaml
deleted file mode 100644
index 74d98eeb..00000000
--- a/embedded/models/hermes-2-pro-mistral.yaml
+++ /dev/null
@@ -1,59 +0,0 @@
-name: hermes-2-pro-mistral
-mmap: true
-parameters:
- model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q6_K.gguf
-
-template:
- chat_message: |
- <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
- {{- if .FunctionCall }}
-
- {{- else if eq .RoleName "tool" }}
-
- {{- end }}
- {{- if .Content}}
- {{.Content }}
- {{- end }}
- {{- if .FunctionCall}}
- {{toJson .FunctionCall}}
- {{- end }}
- {{- if .FunctionCall }}
-
- {{- else if eq .RoleName "tool" }}
-
- {{- end }}<|im_end|>
- # https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF#prompt-format-for-function-calling
- function: |
- <|im_start|>system
- You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
-
- {{range .Functions}}
- {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
- {{end}}
-
- Use the following pydantic model json schema for each tool call you will make:
- {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
- For each function call return a json object with function name and arguments within XML tags as follows:
-
- {'arguments': , 'name': }
- <|im_end|>
- {{.Input -}}
- <|im_start|>assistant
-
- chat: |
- {{.Input -}}
- <|im_start|>assistant
- completion: |
- {{.Input}}
-context_size: 4096
-f16: true
-stopwords:
-- <|im_end|>
--
-- "\n"
-- "\n\n\n"
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "hermes-2-pro-mistral",
- "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
- }'
diff --git a/embedded/models/llama3-instruct.yaml b/embedded/models/llama3-instruct.yaml
deleted file mode 100644
index d483d2b2..00000000
--- a/embedded/models/llama3-instruct.yaml
+++ /dev/null
@@ -1,48 +0,0 @@
-name: llama3-8b-instruct
-mmap: true
-parameters:
- model: huggingface://second-state/Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf
-
-template:
- chat_message: |
- <|start_header_id|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}<|end_header_id|>
-
- {{ if .FunctionCall -}}
- Function call:
- {{ else if eq .RoleName "tool" -}}
- Function response:
- {{ end -}}
- {{ if .Content -}}
- {{.Content -}}
- {{ else if .FunctionCall -}}
- {{ toJson .FunctionCall -}}
- {{ end -}}
- <|eot_id|>
- function: |
- <|start_header_id|>system<|end_header_id|>
-
- You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
-
- {{range .Functions}}
- {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
- {{end}}
-
- Use the following pydantic model json schema for each tool call you will make:
- {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
- Function call:
- chat: |
- <|begin_of_text|>{{.Input }}
- <|start_header_id|>assistant<|end_header_id|>
- completion: |
- {{.Input}}
-context_size: 8192
-f16: true
-stopwords:
-- <|im_end|>
--
-- "<|eot_id|>"
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "llama3-8b-instruct",
- "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
- }'
diff --git a/embedded/models/llava-1.5.yaml b/embedded/models/llava-1.5.yaml
deleted file mode 100644
index 3db48524..00000000
--- a/embedded/models/llava-1.5.yaml
+++ /dev/null
@@ -1,33 +0,0 @@
-backend: llama-cpp
-context_size: 4096
-f16: true
-
-gpu_layers: 90
-mmap: true
-name: llava-1.5
-
-roles:
- user: "USER:"
- assistant: "ASSISTANT:"
- system: "SYSTEM:"
-
-mmproj: llava-v1.5-7b-mmproj-Q8_0.gguf
-parameters:
- model: llava-v1.5-7b-Q4_K.gguf
-
-template:
- chat: |
- A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
- {{.Input}}
- ASSISTANT:
-
-download_files:
-- filename: llava-v1.5-7b-Q4_K.gguf
- uri: huggingface://jartine/llava-v1.5-7B-GGUF/llava-v1.5-7b-Q4_K.gguf
-- filename: llava-v1.5-7b-mmproj-Q8_0.gguf
- uri: huggingface://jartine/llava-v1.5-7B-GGUF/llava-v1.5-7b-mmproj-Q8_0.gguf
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "llava-1.5",
- "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
diff --git a/embedded/models/llava-1.6-mistral.yaml b/embedded/models/llava-1.6-mistral.yaml
deleted file mode 100644
index 602ceb62..00000000
--- a/embedded/models/llava-1.6-mistral.yaml
+++ /dev/null
@@ -1,33 +0,0 @@
-backend: llama-cpp
-context_size: 4096
-f16: true
-
-gpu_layers: 90
-mmap: true
-name: llava-1.6-mistral
-
-roles:
- user: "USER:"
- assistant: "ASSISTANT:"
- system: "SYSTEM:"
-
-mmproj: llava-v1.6-7b-mmproj-f16.gguf
-parameters:
- model: llava-v1.6-mistral-7b.gguf
-
-template:
- chat: |
- A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
- {{.Input}}
- ASSISTANT:
-
-download_files:
-- filename: llava-v1.6-mistral-7b.gguf
- uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/llava-v1.6-mistral-7b.Q6_K.gguf
-- filename: llava-v1.6-7b-mmproj-f16.gguf
- uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/mmproj-model-f16.gguf
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "llava-1.6-mistral",
- "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
diff --git a/embedded/models/llava-1.6-vicuna.yaml b/embedded/models/llava-1.6-vicuna.yaml
deleted file mode 100644
index cea33e7f..00000000
--- a/embedded/models/llava-1.6-vicuna.yaml
+++ /dev/null
@@ -1,37 +0,0 @@
-backend: llama-cpp
-context_size: 4096
-f16: true
-
-gpu_layers: 90
-mmap: true
-name: llava-1.6-vicuna
-
-roles:
- user: "USER:"
- assistant: "ASSISTANT:"
- system: "SYSTEM:"
-
-mmproj: mmproj-vicuna7b-f16.gguf
-parameters:
- model: vicuna-7b-q5_k.gguf
- temperature: 0.2
- top_k: 40
- top_p: 0.95
- seed: -1
-
-template:
- chat: |
- A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
- {{.Input}}
- ASSISTANT:
-
-download_files:
-- filename: vicuna-7b-q5_k.gguf
- uri: https://huggingface.co/cmp-nct/llava-1.6-gguf/resolve/main/vicuna-7b-q5_k.gguf
-- filename: mmproj-vicuna7b-f16.gguf
- uri: https://huggingface.co/cmp-nct/llava-1.6-gguf/resolve/main/mmproj-vicuna7b-f16.gguf
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "llava-1.6-vicuna",
- "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
diff --git a/embedded/models/llava.yaml b/embedded/models/llava.yaml
deleted file mode 100644
index 3d240681..00000000
--- a/embedded/models/llava.yaml
+++ /dev/null
@@ -1,40 +0,0 @@
-backend: llama-cpp
-context_size: 4096
-f16: true
-
-gpu_layers: 90
-mmap: true
-name: llava
-
-roles:
- user: "USER:"
- assistant: "ASSISTANT:"
- system: "SYSTEM:"
-
-mmproj: bakllava-mmproj.gguf
-parameters:
- model: bakllava.gguf
- temperature: 0.2
- top_k: 40
- top_p: 0.95
- seed: -1
-mirostat: 2
-mirostat_eta: 1.0
-mirostat_tau: 1.0
-
-template:
- chat: |
- A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
- {{.Input}}
- ASSISTANT:
-
-download_files:
-- filename: bakllava.gguf
- uri: huggingface://mys/ggml_bakllava-1/ggml-model-q4_k.gguf
-- filename: bakllava-mmproj.gguf
- uri: huggingface://mys/ggml_bakllava-1/mmproj-model-f16.gguf
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "llava",
- "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
diff --git a/embedded/models/mamba-bagel.yaml b/embedded/models/mamba-bagel.yaml
deleted file mode 100644
index aed7142f..00000000
--- a/embedded/models/mamba-bagel.yaml
+++ /dev/null
@@ -1,21 +0,0 @@
-name: bagel
-backend: mamba
-parameters:
- model: "jondurbin/bagel-dpo-2.8b-v0.2"
-systemPrompt: "You are a helpful, unbiased, uncensored assistant."
-template:
- chat_message: |
- {{if eq .RoleName "assistant"}}{{.Content}}{{else}}
- [INST]
- {{if .SystemPrompt}}{{.SystemPrompt}}{{else if eq .RoleName "system"}}<>{{.Content}}<>
-
- {{else if .Content}}{{.Content}}{{end}}
- [/INST]
- {{end}}
- completion: |
- {{.Input}}
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "bagel",
- "messages": [{"role": "user", "content": "how are you doing"}],
- }'
diff --git a/embedded/models/mamba-chat.yaml b/embedded/models/mamba-chat.yaml
deleted file mode 100644
index b0d7fc62..00000000
--- a/embedded/models/mamba-chat.yaml
+++ /dev/null
@@ -1,28 +0,0 @@
-name: mamba-chat
-backend: mamba
-parameters:
- model: "havenhq/mamba-chat"
-
-trimsuffix:
-- <|endoftext|>
-
-# https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json
-# "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
-template:
- chat_message: |
- {{if eq .RoleName "assistant"}}<|assistant|>{{else if eq .RoleName "system"}}<|system|>{{else if eq .RoleName "user"}}<|user|>{{end}}
- {{if .Content}}{{.Content}}{{end}}
-
-
- chat: |
- {{.Input}}
- <|assistant|>
-
- completion: |
- {{.Input}}
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "mamba-chat",
- "messages": [{"role": "user", "content": "how are you doing"}],
- "temperature": 0.7
- }'
\ No newline at end of file
diff --git a/embedded/models/mistral-openorca.yaml b/embedded/models/mistral-openorca.yaml
deleted file mode 100644
index 0794a69b..00000000
--- a/embedded/models/mistral-openorca.yaml
+++ /dev/null
@@ -1,32 +0,0 @@
-name: mistral-openorca
-mmap: true
-parameters:
- model: huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q6_K.gguf
- temperature: 0.2
- top_k: 40
- top_p: 0.95
- seed: -1
-mirostat: 2
-mirostat_eta: 1.0
-mirostat_tau: 1.0
-
-template:
- chat_message: |
- <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
- {{if .Content}}{{.Content}}{{end}}
- <|im_end|>
- chat: |
- {{.Input}}
- <|im_start|>assistant
- completion: |
- {{.Input}}
-context_size: 4096
-f16: true
-stopwords:
-- <|im_end|>
--
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "mistral-openorca",
- "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
- }'
diff --git a/embedded/models/mixtral-instruct.yaml b/embedded/models/mixtral-instruct.yaml
deleted file mode 100644
index 246b2324..00000000
--- a/embedded/models/mixtral-instruct.yaml
+++ /dev/null
@@ -1,25 +0,0 @@
-name: mixtral-instruct
-mmap: true
-parameters:
- model: huggingface://TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/mixtral-8x7b-instruct-v0.1.Q2_K.gguf
- temperature: 0.2
- top_k: 40
- seed: -1
- top_p: 0.95
-mirostat: 2
-mirostat_eta: 1.0
-mirostat_tau: 1.0
-
-template:
- chat: &chat |
- [INST] {{.Input}} [/INST]
- completion: *chat
-context_size: 4096
-f16: true
-gpu_layers: 90
-
-usage: |
- curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
- "model": "mixtral-instruct",
- "prompt": "How are you doing?"
- }'
\ No newline at end of file
diff --git a/embedded/models/phi-2-chat.yaml b/embedded/models/phi-2-chat.yaml
deleted file mode 100644
index 4a3ca7aa..00000000
--- a/embedded/models/phi-2-chat.yaml
+++ /dev/null
@@ -1,25 +0,0 @@
-name: phi-2-chat
-mmap: true
-parameters:
- model: huggingface://l3utterfly/phi-2-layla-v1-chatml-gguf/phi-2-layla-v1-chatml-Q8_0.gguf
-
-template:
- chat_message: |
- <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
- {{if .Content}}{{.Content}}{{end}}
- <|im_end|>
- chat: |
- {{.Input}}
- <|im_start|>assistant
- completion: |
- {{.Input}}
-context_size: 4096
-f16: true
-stopwords:
-- <|im_end|>
--
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "phi-2-chat",
- "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
- }'
diff --git a/embedded/models/phi-2-orange.yaml b/embedded/models/phi-2-orange.yaml
deleted file mode 100644
index 838909c9..00000000
--- a/embedded/models/phi-2-orange.yaml
+++ /dev/null
@@ -1,30 +0,0 @@
-name: phi-2-orange
-mmap: true
-parameters:
- model: huggingface://l3utterfly/phi-2-orange-GGUF/phi-2-orange.Q6_K.gguf
-
-template:
- chat_message: |
- <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
- {{if .Content}}{{.Content}}{{end}}
- <|im_end|>
- chat: |
- {{.Input}}
- <|im_start|>assistant
- completion: |
- {{.Input}}
-context_size: 4096
-f16: true
-stopwords:
-- <|im_end|>
--
-
-description: |
- This model is a chatbot that can be used for general conversation.
- [Model card](https://huggingface.co/TheBloke/phi-2-orange-GGUF)
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "phi-2-orange",
- "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
- }'
diff --git a/embedded/models/rhasspy-voice-en-us-amy.yaml b/embedded/models/rhasspy-voice-en-us-amy.yaml
deleted file mode 100644
index 911293ca..00000000
--- a/embedded/models/rhasspy-voice-en-us-amy.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-name: voice-en-us-amy-low
-download_files:
- - filename: voice-en-us-amy-low.tar.gz
- uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz
-
-
-usage: |
- To test if this model works as expected, you can use the following curl command:
-
- curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
- "model":"en-us-amy-low.onnx",
- "input": "Hi, this is a test."
- }'
\ No newline at end of file
diff --git a/embedded/models/tinyllama-chat.yaml b/embedded/models/tinyllama-chat.yaml
deleted file mode 100644
index 48c44f9f..00000000
--- a/embedded/models/tinyllama-chat.yaml
+++ /dev/null
@@ -1,29 +0,0 @@
-name: tinyllama-chat
-mmap: true
-parameters:
- model: huggingface://TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF/tinyllama-1.1b-chat-v0.3.Q8_0.gguf
- temperature: 0.2
- top_k: 40
- seed: -1
- top_p: 0.95
-template:
- chat_message: |
- <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
- {{if .Content}}{{.Content}}{{end}}<|im_end|>
- chat: |
- {{.Input}}
- <|im_start|>assistant
-
- completion: |
- {{.Input}}
-context_size: 4096
-f16: true
-stopwords:
-- <|im_end|>
-gpu_layers: 90
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "tinyllama-chat",
- "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
- }'
\ No newline at end of file
diff --git a/embedded/models/transformers-tinyllama.yaml b/embedded/models/transformers-tinyllama.yaml
deleted file mode 100644
index ee6e7889..00000000
--- a/embedded/models/transformers-tinyllama.yaml
+++ /dev/null
@@ -1,31 +0,0 @@
-name: tinyllama-chat
-backend: transformers
-type: AutoModelForCausalLM
-
-parameters:
- model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- temperature: 0.2
- top_k: 40
- top_p: 0.95
- max_tokens: 4096
-
-template:
- chat_message: |
- <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
- {{if .Content}}{{.Content}}{{end}}<|im_end|>
- chat: |
- {{.Input}}
- <|im_start|>assistant
-
- completion: |
- {{.Input}}
-
-stopwords:
-- <|im_end|>
-
-usage: |
- curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
- "model": "tinyllama-chat",
- "messages": [{"role": "user", "content": "Say this is a test!"}],
- "temperature": 0.7
- }'
diff --git a/embedded/models/vall-e-x.yaml b/embedded/models/vall-e-x.yaml
deleted file mode 100644
index b97015f6..00000000
--- a/embedded/models/vall-e-x.yaml
+++ /dev/null
@@ -1,8 +0,0 @@
-usage: |
- Vall-e-x works without any configuration, to test it, you can run the following curl command:
-
- curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
- "backend": "vall-e-x",
- "input":"Hello, this is a test!"
- }' | aplay
-# TODO: This is a placeholder until we manage to pre-load HF/Transformers models
\ No newline at end of file
diff --git a/embedded/models/whisper-base.yaml b/embedded/models/whisper-base.yaml
deleted file mode 100644
index f7ebd217..00000000
--- a/embedded/models/whisper-base.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-name: whisper
-backend: whisper
-parameters:
- model: ggml-whisper-base.bin
-
-usage: |
- ## example audio file
- wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
-
- ## Send the example audio file to the transcriptions endpoint
- curl http://localhost:8080/v1/audio/transcriptions \
- -H "Content-Type: multipart/form-data" \
- -F file="@$PWD/gb1.ogg" -F model="whisper"
-
-download_files:
-- filename: "ggml-whisper-base.bin"
- sha256: "60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe"
- uri: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
\ No newline at end of file
diff --git a/pkg/startup/model_preload.go b/pkg/startup/model_preload.go
index a445b10e..0f598df5 100644
--- a/pkg/startup/model_preload.go
+++ b/pkg/startup/model_preload.go
@@ -9,7 +9,6 @@ import (
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
- "github.com/mudler/LocalAI/embedded"
"github.com/mudler/LocalAI/pkg/downloader"
"github.com/mudler/LocalAI/pkg/utils"
"github.com/rs/zerolog/log"
@@ -18,42 +17,17 @@ import (
// InstallModels will preload models from the given list of URLs and galleries
// It will download the model if it is not already present in the model path
// It will also try to resolve if the model is an embedded model YAML configuration
-func InstallModels(galleries []config.Gallery, modelLibraryURL string, modelPath string, enforceScan bool, downloadStatus func(string, string, string, float64), models ...string) error {
+func InstallModels(galleries []config.Gallery, modelPath string, enforceScan bool, downloadStatus func(string, string, string, float64), models ...string) error {
// create an error that groups all errors
var err error
- lib, _ := embedded.GetRemoteLibraryShorteners(modelLibraryURL, modelPath)
-
for _, url := range models {
// As a best effort, try to resolve the model from the remote library
// if it's not resolved we try with the other method below
- if modelLibraryURL != "" {
- if lib[url] != "" {
- log.Debug().Msgf("[startup] model configuration is defined remotely: %s (%s)", url, lib[url])
- url = lib[url]
- }
- }
- url = embedded.ModelShortURL(url)
uri := downloader.URI(url)
switch {
- case embedded.ExistsInModelsLibrary(url):
- modelYAML, e := embedded.ResolveContent(url)
- // If we resolve something, just save it to disk and continue
- if e != nil {
- log.Error().Err(e).Msg("error resolving model content")
- err = errors.Join(err, e)
- continue
- }
-
- log.Debug().Msgf("[startup] resolved embedded model: %s", url)
- md5Name := utils.MD5(url)
- modelDefinitionFilePath := filepath.Join(modelPath, md5Name) + ".yaml"
- if e := os.WriteFile(modelDefinitionFilePath, modelYAML, 0600); err != nil {
- log.Error().Err(e).Str("filepath", modelDefinitionFilePath).Msg("error writing model definition")
- err = errors.Join(err, e)
- }
case uri.LooksLikeOCI():
log.Debug().Msgf("[startup] resolved OCI model to download: %s", url)
diff --git a/pkg/startup/model_preload_test.go b/pkg/startup/model_preload_test.go
index 78cf7311..51e6d702 100644
--- a/pkg/startup/model_preload_test.go
+++ b/pkg/startup/model_preload_test.go
@@ -7,7 +7,6 @@ import (
"github.com/mudler/LocalAI/core/config"
. "github.com/mudler/LocalAI/pkg/startup"
- "github.com/mudler/LocalAI/pkg/utils"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
@@ -16,29 +15,13 @@ import (
var _ = Describe("Preload test", func() {
Context("Preloading from strings", func() {
- It("loads from remote url", func() {
- tmpdir, err := os.MkdirTemp("", "")
- Expect(err).ToNot(HaveOccurred())
- libraryURL := "https://raw.githubusercontent.com/mudler/LocalAI/master/embedded/model_library.yaml"
- fileName := fmt.Sprintf("%s.yaml", "phi-2")
-
- InstallModels([]config.Gallery{}, libraryURL, tmpdir, true, nil, "phi-2")
-
- resultFile := filepath.Join(tmpdir, fileName)
-
- content, err := os.ReadFile(resultFile)
- Expect(err).ToNot(HaveOccurred())
-
- Expect(string(content)).To(ContainSubstring("name: phi-2"))
- })
-
It("loads from embedded full-urls", func() {
tmpdir, err := os.MkdirTemp("", "")
Expect(err).ToNot(HaveOccurred())
url := "https://raw.githubusercontent.com/mudler/LocalAI-examples/main/configurations/phi-2.yaml"
fileName := fmt.Sprintf("%s.yaml", "phi-2")
- InstallModels([]config.Gallery{}, "", tmpdir, true, nil, url)
+ InstallModels([]config.Gallery{}, tmpdir, true, nil, url)
resultFile := filepath.Join(tmpdir, fileName)
@@ -47,45 +30,13 @@ var _ = Describe("Preload test", func() {
Expect(string(content)).To(ContainSubstring("name: phi-2"))
})
- It("loads from embedded short-urls", func() {
- tmpdir, err := os.MkdirTemp("", "")
- Expect(err).ToNot(HaveOccurred())
- url := "phi-2"
-
- InstallModels([]config.Gallery{}, "", tmpdir, true, nil, url)
-
- entry, err := os.ReadDir(tmpdir)
- Expect(err).ToNot(HaveOccurred())
- Expect(entry).To(HaveLen(1))
- resultFile := entry[0].Name()
-
- content, err := os.ReadFile(filepath.Join(tmpdir, resultFile))
- Expect(err).ToNot(HaveOccurred())
-
- Expect(string(content)).To(ContainSubstring("name: phi-2"))
- })
- It("loads from embedded models", func() {
- tmpdir, err := os.MkdirTemp("", "")
- Expect(err).ToNot(HaveOccurred())
- url := "mistral-openorca"
- fileName := fmt.Sprintf("%s.yaml", utils.MD5(url))
-
- InstallModels([]config.Gallery{}, "", tmpdir, true, nil, url)
-
- resultFile := filepath.Join(tmpdir, fileName)
-
- content, err := os.ReadFile(resultFile)
- Expect(err).ToNot(HaveOccurred())
-
- Expect(string(content)).To(ContainSubstring("name: mistral-openorca"))
- })
It("downloads from urls", func() {
tmpdir, err := os.MkdirTemp("", "")
Expect(err).ToNot(HaveOccurred())
url := "huggingface://TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF/tinyllama-1.1b-chat-v0.3.Q2_K.gguf"
fileName := fmt.Sprintf("%s.gguf", "tinyllama-1.1b-chat-v0.3.Q2_K")
- err = InstallModels([]config.Gallery{}, "", tmpdir, false, nil, url)
+ err = InstallModels([]config.Gallery{}, tmpdir, false, nil, url)
Expect(err).ToNot(HaveOccurred())
resultFile := filepath.Join(tmpdir, fileName)
diff --git a/embedded/webui_static.yaml b/webui_static.yaml
similarity index 100%
rename from embedded/webui_static.yaml
rename to webui_static.yaml
From f1d6d65417e1bccd4d93990ccfd36cc1a0602605 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 30 Jan 2025 16:38:35 +0100
Subject: [PATCH 115/679] chore(model gallery): add virtuoso-lite (#4718)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 1716f2b1..990059c9 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -198,6 +198,20 @@
- filename: NightWing3-10B-v0.1-Q4_K_M.gguf
sha256: 2e87671542d22fe1ef9a68e43f2fdab7c2759479ad531946d9f0bdeffa6f5747
uri: huggingface://bartowski/NightWing3-10B-v0.1-GGUF/NightWing3-10B-v0.1-Q4_K_M.gguf
+- !!merge <<: *falcon3
+ name: "virtuoso-lite"
+ urls:
+ - https://huggingface.co/arcee-ai/Virtuoso-Lite
+ - https://huggingface.co/bartowski/Virtuoso-Lite-GGUF
+ description: |
+ Virtuoso-Lite (10B) is our next-generation, 10-billion-parameter language model based on the Llama-3 architecture. It is distilled from Deepseek-v3 using ~1.1B tokens/logits, allowing it to achieve robust performance at a significantly reduced parameter count compared to larger models. Despite its compact size, Virtuoso-Lite excels in a variety of tasks, demonstrating advanced reasoning, code generation, and mathematical problem-solving capabilities.
+ overrides:
+ parameters:
+ model: Virtuoso-Lite-Q4_K_M.gguf
+ files:
+ - filename: Virtuoso-Lite-Q4_K_M.gguf
+ sha256: 1d21bef8467a11a1e473d397128b05fb87b7e824606cdaea061e550cb219fee2
+ uri: huggingface://bartowski/Virtuoso-Lite-GGUF/Virtuoso-Lite-Q4_K_M.gguf
- &intellect1
name: "intellect-1-instruct"
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
From 244f4b564f71e4dca0be997e2002cfab5ffd38a9 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 30 Jan 2025 16:42:48 +0100
Subject: [PATCH 116/679] chore(model gallery): add selene-1-mini-llama-3.1-8b
(#4719)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 990059c9..7a7b0418 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5359,6 +5359,29 @@
- filename: deepseek-r1-distill-llama-8b-Q4_K_M.gguf
sha256: f8eba201522ab44b79bc54166126bfaf836111ff4cbf2d13c59c3b57da10573b
uri: huggingface://unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf
+- !!merge <<: *llama31
+ name: "selene-1-mini-llama-3.1-8b"
+ icon: https://atla-ai.notion.site/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Ff08e6e70-73af-4363-9621-90e906b92ebc%2F1bfb4316-1ce6-40a0-800c-253739cfcdeb%2Fatla_white3x.svg?table=block&id=17c309d1-7745-80f9-8f60-e755409acd8d&spaceId=f08e6e70-73af-4363-9621-90e906b92ebc&userId=&cache=v2
+ urls:
+ - https://huggingface.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B
+ - https://huggingface.co/bartowski/Selene-1-Mini-Llama-3.1-8B-GGUF
+ description: |
+ Atla Selene Mini is a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini achieves comparable performance to models 10x its size, outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ.
+
+ Post-trained from Llama-3.1-8B across a wide range of evaluation tasks and scoring criteria, Selene Mini outperforms prior small models overall across 11 benchmarks covering three different types of tasks:
+
+ Absolute scoring, e.g. "Evaluate the harmlessness of this response on a scale of 1-5"
+ Classification, e.g. "Does this response address the user query? Answer Yes or No."
+ Pairwise preference. e.g. "Which of the following responses is more logically consistent - A or B?"
+
+ It is also the #1 8B generative model on RewardBench.
+ overrides:
+ parameters:
+ model: Selene-1-Mini-Llama-3.1-8B-Q4_K_M.gguf
+ files:
+ - filename: Selene-1-Mini-Llama-3.1-8B-Q4_K_M.gguf
+ sha256: 908e6ce19f7cd3d7394bd7c38e43de2f228aca6aceda35c7ee70d069ad60493e
+ uri: huggingface://bartowski/Selene-1-Mini-Llama-3.1-8B-GGUF/Selene-1-Mini-Llama-3.1-8B-Q4_K_M.gguf
- &deepseek ## Deepseek
url: "github:mudler/LocalAI/gallery/deepseek.yaml@master"
name: "deepseek-coder-v2-lite-instruct"
From 60ec2cf7513f2d40b9c1836cdf2e06b14a38fd1a Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 30 Jan 2025 16:44:44 +0100
Subject: [PATCH 117/679] chore(model gallery): add openthinker-7b (#4720)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 7a7b0418..6b391356 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3455,6 +3455,25 @@
- filename: Confucius-o1-14B-Q4_K_M.gguf
sha256: 03182920edd8667db7d2a362ca2d25e88f4b615b383b5a55c764f4715fb22dd9
uri: huggingface://bartowski/Confucius-o1-14B-GGUF/Confucius-o1-14B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "openthinker-7b"
+ icon: https://huggingface.co/datasets/open-thoughts/open-thoughts-114k/resolve/main/open_thoughts.png
+ urls:
+ - https://huggingface.co/open-thoughts/OpenThinker-7B
+ - https://huggingface.co/bartowski/OpenThinker-7B-GGUF
+ description: |
+ This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the OpenThoughts-114k dataset dataset.
+
+ The dataset is derived by distilling DeepSeek-R1 using the data pipeline available on github. More info about the dataset can be found on the dataset card at OpenThoughts-114k dataset.
+
+ This model improves upon the Bespoke-Stratos-7B model, which used 17k examples (Bespoke-Stratos-17k dataset). The numbers reported in the table below are evaluated with our open-source tool Evalchemy.
+ overrides:
+ parameters:
+ model: OpenThinker-7B-Q4_K_M.gguf
+ files:
+ - filename: OpenThinker-7B-Q4_K_M.gguf
+ sha256: 94dff1a7acd685db5cff7afdb837aab8172e06d65fe6179ba47428e3030acd93
+ uri: huggingface://bartowski/OpenThinker-7B-GGUF/OpenThinker-7B-Q4_K_M.gguf
- &llama31 ## LLama3.1
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
From cd5489ce47452c523be196c291e6f0a5b5922424 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Fri, 31 Jan 2025 08:51:32 +0100
Subject: [PATCH 118/679] chore(model-gallery): :arrow_up: update checksum
(#4723)
:arrow_up: Checksum updates in gallery/index.yaml
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
gallery/index.yaml | 150 +++++++++++++++++++++------------------------
1 file changed, 69 insertions(+), 81 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 6b391356..98c3a782 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -852,8 +852,8 @@
- filename: salamandra-7b-instruct.Q4_K_M-f32.gguf
sha256: bac8e8c1d1d9d53cbdb148b8ff9ad378ddb392429207099e85b5aae3a43bff3d
uri: huggingface://cstr/salamandra-7b-instruct-GGUF/salamandra-7b-instruct.Q4_K_M-f32.gguf
-- &llama32 ## llama3.2
- url: "github:mudler/LocalAI/gallery/llama3.2-quantized.yaml@master"
+- &llama32
+ url: "github:mudler/LocalAI/gallery/llama3.2-quantized.yaml@master" ## llama3.2
icon: https://avatars.githubusercontent.com/u/153379578
license: llama3.2
description: |
@@ -1375,11 +1375,7 @@
urls:
- https://huggingface.co/HuggingFaceTB/FineMath-Llama-3B
- https://huggingface.co/bartowski/FineMath-Llama-3B-GGUF
- description: |
- This is a continual-pre-training of Llama-3.2-3B on a mix of š FineMath (our new high quality math dataset) and FineWeb-Edu.
-
- The model demonstrates superior math performance compared to Llama 3.2 3B, while maintaining similar performance on knowledge, reasoning, and common sense benchmarks.
- It was trained on 160B tokens using a mix of 40% FineWeb-Edu and 60% from FineMath (30% FineMath-4+ subset and 30% InfiWebMath-4+ subset). We use nanotron for the training, and you can find the training scripts in our SmolLM2 GitHub repo.
+ description: "This is a continual-pre-training of Llama-3.2-3B on a mix of \U0001F4D0 FineMath (our new high quality math dataset) and FineWeb-Edu.\n\nThe model demonstrates superior math performance compared to Llama 3.2 3B, while maintaining similar performance on knowledge, reasoning, and common sense benchmarks.\nIt was trained on 160B tokens using a mix of 40% FineWeb-Edu and 60% from FineMath (30% FineMath-4+ subset and 30% InfiWebMath-4+ subset). We use nanotron for the training, and you can find the training scripts in our SmolLM2 GitHub repo.\n"
overrides:
parameters:
model: FineMath-Llama-3B-Q4_K_M.gguf
@@ -1387,8 +1383,8 @@
- filename: FineMath-Llama-3B-Q4_K_M.gguf
sha256: 16c73b5cf2a417a7e1608bcc9469f1461fc3e759ce04a3a337f48df977dc158c
uri: huggingface://bartowski/FineMath-Llama-3B-GGUF/FineMath-Llama-3B-Q4_K_M.gguf
-- &qwen25 ## Qwen2.5
- name: "qwen2.5-14b-instruct"
+- &qwen25
+ name: "qwen2.5-14b-instruct" ## Qwen2.5
icon: https://avatars.githubusercontent.com/u/141221163
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
license: apache-2.0
@@ -3291,15 +3287,7 @@
urls:
- https://huggingface.co/Krystalan/DRT-o1-14B
- https://huggingface.co/bartowski/DRT-o1-14B-GGUF
- description: |
- This repository contains the resources for our paper "DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought"
- In this work, we introduce DRT-o1, an attempt to bring the success of long thought reasoning to neural machine translation (MT). To this end,
-
- š We mine English sentences with similes or metaphors from existing literature books, which are suitable for translation via long thought.
- š We propose a designed multi-agent framework with three agents (i.e., a translator, an advisor and an evaluator) to synthesize the MT samples with long thought. There are 22,264 synthesized samples in total.
- š We train DRT-o1-8B, DRT-o1-7B and DRT-o1-14B using Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct as backbones.
-
- Our goal is not to achieve competitive performance with OpenAIās O1 in neural machine translation (MT). Instead, we explore technical routes to bring the success of long thought to MT. To this end, we introduce DRT-o1, a byproduct of our exploration, and we hope it could facilitate the corresponding research in this direction.
+ description: "This repository contains the resources for our paper \"DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought\"\nIn this work, we introduce DRT-o1, an attempt to bring the success of long thought reasoning to neural machine translation (MT). To this end,\n\n\U0001F31F We mine English sentences with similes or metaphors from existing literature books, which are suitable for translation via long thought.\n\U0001F31F We propose a designed multi-agent framework with three agents (i.e., a translator, an advisor and an evaluator) to synthesize the MT samples with long thought. There are 22,264 synthesized samples in total.\n\U0001F31F We train DRT-o1-8B, DRT-o1-7B and DRT-o1-14B using Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct as backbones.\n\nOur goal is not to achieve competitive performance with OpenAIās O1 in neural machine translation (MT). Instead, we explore technical routes to bring the success of long thought to MT. To this end, we introduce DRT-o1, a byproduct of our exploration, and we hope it could facilitate the corresponding research in this direction.\n"
overrides:
parameters:
model: DRT-o1-14B-Q4_K_M.gguf
@@ -3356,8 +3344,8 @@
- filename: Chuluun-Qwen2.5-72B-v0.08-Q4_K_M.gguf
sha256: 0fec82625f74a9a340837de7af287b1d9042e5aeb70cda2621426db99958b0af
uri: huggingface://bartowski/Chuluun-Qwen2.5-72B-v0.08-GGUF/Chuluun-Qwen2.5-72B-v0.08-Q4_K_M.gguf
-- &smollm ## SmolLM
- url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+- &smollm
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## SmolLM
name: "smollm-1.7b-instruct"
icon: https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png
tags:
@@ -3421,19 +3409,19 @@
- https://huggingface.co/nbeerbower/Dumpling-Qwen2.5-32B
- https://huggingface.co/bartowski/Dumpling-Qwen2.5-32B-GGUF
description: |
- nbeerbower/Rombos-EVAGutenberg-TIES-Qwen2.5-32B finetuned on:
- nbeerbower/GreatFirewall-DPO
- nbeerbower/Schule-DPO
- nbeerbower/Purpura-DPO
- nbeerbower/Arkhaios-DPO
- jondurbin/truthy-dpo-v0.1
- antiven0m/physical-reasoning-dpo
- flammenai/Date-DPO-NoAsterisks
- flammenai/Prude-Phi3-DPO
- Atsunori/HelpSteer2-DPO
- jondurbin/gutenberg-dpo-v0.1
- nbeerbower/gutenberg2-dpo
- nbeerbower/gutenberg-moderne-dpo.
+ nbeerbower/Rombos-EVAGutenberg-TIES-Qwen2.5-32B finetuned on:
+ nbeerbower/GreatFirewall-DPO
+ nbeerbower/Schule-DPO
+ nbeerbower/Purpura-DPO
+ nbeerbower/Arkhaios-DPO
+ jondurbin/truthy-dpo-v0.1
+ antiven0m/physical-reasoning-dpo
+ flammenai/Date-DPO-NoAsterisks
+ flammenai/Prude-Phi3-DPO
+ Atsunori/HelpSteer2-DPO
+ jondurbin/gutenberg-dpo-v0.1
+ nbeerbower/gutenberg2-dpo
+ nbeerbower/gutenberg-moderne-dpo.
overrides:
parameters:
model: Dumpling-Qwen2.5-32B-Q4_K_M.gguf
@@ -3474,8 +3462,8 @@
- filename: OpenThinker-7B-Q4_K_M.gguf
sha256: 94dff1a7acd685db5cff7afdb837aab8172e06d65fe6179ba47428e3030acd93
uri: huggingface://bartowski/OpenThinker-7B-GGUF/OpenThinker-7B-Q4_K_M.gguf
-- &llama31 ## LLama3.1
- url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
+- &llama31
+ url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1
icon: https://avatars.githubusercontent.com/u/153379578
name: "meta-llama-3.1-8b-instruct"
license: llama3.1
@@ -5401,8 +5389,8 @@
- filename: Selene-1-Mini-Llama-3.1-8B-Q4_K_M.gguf
sha256: 908e6ce19f7cd3d7394bd7c38e43de2f228aca6aceda35c7ee70d069ad60493e
uri: huggingface://bartowski/Selene-1-Mini-Llama-3.1-8B-GGUF/Selene-1-Mini-Llama-3.1-8B-Q4_K_M.gguf
-- &deepseek ## Deepseek
- url: "github:mudler/LocalAI/gallery/deepseek.yaml@master"
+- &deepseek
+ url: "github:mudler/LocalAI/gallery/deepseek.yaml@master" ## Deepseek
name: "deepseek-coder-v2-lite-instruct"
icon: "https://avatars.githubusercontent.com/u/148330874"
license: deepseek
@@ -5466,8 +5454,8 @@
- filename: archangel_sft_pythia2-8b.Q4_K_M.gguf
sha256: a47782c55ef2b39b19644213720a599d9849511a73c9ebb0c1de749383c0a0f8
uri: huggingface://RichardErkhov/ContextualAI_-_archangel_sft_pythia2-8b-gguf/archangel_sft_pythia2-8b.Q4_K_M.gguf
-- &deepseek-r1 ## Start DeepSeek-R1
- url: "github:mudler/LocalAI/gallery/deepseek-r1.yaml@master"
+- &deepseek-r1
+ url: "github:mudler/LocalAI/gallery/deepseek-r1.yaml@master" ## Start DeepSeek-R1
name: "deepseek-r1-distill-qwen-1.5b"
icon: "https://avatars.githubusercontent.com/u/148330874"
urls:
@@ -5607,8 +5595,8 @@
- filename: FuseO1-DeepSeekR1-QwQ-32B-Preview-Q4_K_M.gguf
sha256: 16f1fb6bf76bb971a7a63e1a68cddd09421f4a767b86eec55eed1e08178f78f2
uri: huggingface://bartowski/FuseO1-DeepSeekR1-QwQ-32B-Preview-GGUF/FuseO1-DeepSeekR1-QwQ-32B-Preview-Q4_K_M.gguf
-- &qwen2 ## Start QWEN2
- url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+- &qwen2
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## Start QWEN2
name: "qwen2-7b-instruct"
icon: https://avatars.githubusercontent.com/u/141221163
license: apache-2.0
@@ -5991,10 +5979,10 @@
sha256: 3a4078d53b46f22989adbf998ce5a3fd090b6541f112d7e936eb4204a04100b1
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/ggml-model-Q4_K_M.gguf
- filename: minicpm-v-2_6-mmproj-f16.gguf
- sha256: f8a805e9e62085805c69c427287acefc284932eb4abfe6e1b1ce431d27e2f4e0
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
-- &mistral03 ## START Mistral
- url: "github:mudler/LocalAI/gallery/mistral-0.3.yaml@master"
+ sha256: 4485f68a0f1aa404c391e788ea88ea653c100d8e98fe572698f701e5809711fd
+- &mistral03
+ url: "github:mudler/LocalAI/gallery/mistral-0.3.yaml@master" ## START Mistral
name: "mistral-7b-instruct-v0.3"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/62dac1c7a8ead43d20e3e17a/wrLf5yaGC6ng4XME70w6Z.png
license: apache-2.0
@@ -6625,8 +6613,8 @@
- filename: Wayfarer-12B-Q4_K_M.gguf
sha256: 6cd9f290c820c64854fcdcfd312b066447acc2f63abe2e2e71af9bc4f1946c08
uri: huggingface://bartowski/Wayfarer-12B-GGUF/Wayfarer-12B-Q4_K_M.gguf
-- &mudler ### START mudler's LocalAI specific-models
- url: "github:mudler/LocalAI/gallery/mudler.yaml@master"
+- &mudler
+ url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
name: "LocalAI-llama3-8b-function-call-v0.2"
icon: "https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/us5JKi9z046p8K-cn_M0w.webp"
license: llama3
@@ -6670,8 +6658,8 @@
- filename: Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin
sha256: 579cbb229f9c11d0330759ff4733102d2491615a4c61289e26c09d1b3a583fec
uri: huggingface://mudler/Mirai-Nova-Llama3-LocalAI-8B-v0.1-GGUF/Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin
-- &parler-tts ### START parler-tts
- url: "github:mudler/LocalAI/gallery/parler-tts.yaml@master"
+- &parler-tts
+ url: "github:mudler/LocalAI/gallery/parler-tts.yaml@master" ### START parler-tts
name: parler-tts-mini-v0.1
overrides:
parameters:
@@ -6687,8 +6675,8 @@
- cpu
- text-to-speech
- python
-- &rerankers ### START rerankers
- url: "github:mudler/LocalAI/gallery/rerankers.yaml@master"
+- &rerankers
+ url: "github:mudler/LocalAI/gallery/rerankers.yaml@master" ### START rerankers
name: cross-encoder
parameters:
model: cross-encoder
@@ -8939,8 +8927,8 @@
- filename: Copus-2x8B.i1-Q4_K_M.gguf
sha256: 685da1ba49e203e8f491105585143d76044286d4b4687bed37d325f6b55501e5
uri: huggingface://mradermacher/Copus-2x8B-i1-GGUF/Copus-2x8B.i1-Q4_K_M.gguf
-- &yi-chat ### Start Yi
- url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+- &yi-chat
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ### Start Yi
icon: "https://github.com/01-ai/Yi/raw/main/assets/img/Yi_logo_icon_light.svg"
name: "yi-1.5-9b-chat"
license: apache-2.0
@@ -9150,8 +9138,8 @@
- filename: Fimbulvetr-11B-v2-Q4_K_M-imat.gguf
sha256: 3f309b59508342536a70edd6c4be6cf4f2cb97f2e32cbc79ad2ab3f4c02933a4
uri: huggingface://Lewdiculous/Fimbulvetr-11B-v2-GGUF-IQ-Imatrix/Fimbulvetr-11B-v2-Q4_K_M-imat.gguf
-- &noromaid ### Start noromaid
- url: "github:mudler/LocalAI/gallery/noromaid.yaml@master"
+- &noromaid
+ url: "github:mudler/LocalAI/gallery/noromaid.yaml@master" ### Start noromaid
name: "noromaid-13b-0.4-DPO"
icon: https://cdn-uploads.huggingface.co/production/uploads/630dfb008df86f1e5becadc3/VKX2Z2yjZX5J8kXzgeCYO.png
license: cc-by-nc-4.0
@@ -9170,8 +9158,8 @@
- filename: Noromaid-13B-0.4-DPO.q4_k_m.gguf
sha256: cb28e878d034fae3d0b43326c5fc1cfb4ab583b17c56e41d6ce023caec03c1c1
uri: huggingface://NeverSleep/Noromaid-13B-0.4-DPO-GGUF/Noromaid-13B-0.4-DPO.q4_k_m.gguf
-- &wizardlm2 ### START Vicuna based
- url: "github:mudler/LocalAI/gallery/wizardlm2.yaml@master"
+- &wizardlm2
+ url: "github:mudler/LocalAI/gallery/wizardlm2.yaml@master" ### START Vicuna based
name: "wizardlm2-7b"
description: |
We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.
@@ -9225,8 +9213,8 @@
- filename: moondream2-mmproj-f16.gguf
sha256: 4cc1cb3660d87ff56432ebeb7884ad35d67c48c7b9f6b2856f305e39c38eed8f
uri: huggingface://moondream/moondream2-gguf/moondream2-mmproj-f16.gguf
-- &llava ### START LLaVa
- name: "llava-1.6-vicuna"
+- &llava
+ name: "llava-1.6-vicuna" ### START LLaVa
icon: https://github.com/lobehub/lobe-icons/raw/master/packages/static-png/dark/llava-color.png
url: "github:mudler/LocalAI/gallery/llava.yaml@master"
license: apache-2.0
@@ -9639,8 +9627,8 @@
sha256: 010ec3ba94cb5ad2d9c8f95f46f01c6d80f83deab9df0a0831334ea45afff3e2
uri: huggingface://openbmb/MiniCPM-Llama3-V-2_5-gguf/ggml-model-Q4_K_M.gguf
- filename: minicpm-llama3-mmproj-f16.gguf
- sha256: 391d11736c3cd24a90417c47b0c88975e86918fcddb1b00494c4d715b08af13e
uri: huggingface://openbmb/MiniCPM-Llama3-V-2_5-gguf/mmproj-model-f16.gguf
+ sha256: 2c2d773537faf6a7e093655d0d5e14801ef0b2121c6c3e1981ce094c2b62f4f9
- !!merge <<: *llama3
name: "llama-3-cursedstock-v1.8-8b-iq-imatrix"
urls:
@@ -10082,8 +10070,8 @@
- filename: Freyja-v4.95-maldv-7b-NON-FICTION.i1-Q4_K_M.gguf
sha256: cdc0f4de6df2ba120835fbd25c2a0ae2af8548f46d2c40c7a018c51c3d19e0c0
uri: huggingface://mradermacher/Freyja-v4.95-maldv-7b-NON-FICTION-i1-GGUF/Freyja-v4.95-maldv-7b-NON-FICTION.i1-Q4_K_M.gguf
-- &chatml ### ChatML
- url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+- &chatml
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ### ChatML
name: "una-thepitbull-21.4b-v2"
license: afl-3.0
icon: https://huggingface.co/fblgit/UNA-ThePitbull-21.4B-v2/resolve/main/DE-UNA-ThePitbull-21.4B-v2.png
@@ -10367,8 +10355,8 @@
- filename: Triangulum-10B.Q4_K_M.gguf
sha256: dd071f99edf6b166044bf229cdeec19419c4c348e3fc3d6587cfcc55e6fb85fa
uri: huggingface://mradermacher/Triangulum-10B-GGUF/Triangulum-10B.Q4_K_M.gguf
-- &command-R ### START Command-r
- url: "github:mudler/LocalAI/gallery/command-r.yaml@master"
+- &command-R
+ url: "github:mudler/LocalAI/gallery/command-r.yaml@master" ### START Command-r
name: "command-r-v01:q1_s"
license: "cc-by-nc-4.0"
icon: https://cdn.sanity.io/images/rjtqmwfu/production/ae020d94b599cc453cc09ebc80be06d35d953c23-102x18.svg
@@ -10422,8 +10410,8 @@
- filename: "aya-23-35B-Q4_K_M.gguf"
sha256: "57824768c1a945e21e028c8e9a29b39adb4838d489f5865c82601ab9ad98065d"
uri: "huggingface://bartowski/aya-23-35B-GGUF/aya-23-35B-Q4_K_M.gguf"
-- &phi-2-chat ### START Phi-2
- url: "github:mudler/LocalAI/gallery/phi-2-chat.yaml@master"
+- &phi-2-chat
+ url: "github:mudler/LocalAI/gallery/phi-2-chat.yaml@master" ### START Phi-2
license: mit
description: |
Phi-2 fine-tuned by the OpenHermes 2.5 dataset optimised for multi-turn conversation and character impersonation.
@@ -10544,8 +10532,8 @@
- filename: internlm3-8b-instruct-Q4_K_M.gguf
uri: huggingface://bartowski/internlm3-8b-instruct-GGUF/internlm3-8b-instruct-Q4_K_M.gguf
sha256: 2a9644687318e8659c9cf9b40730d5cc2f5af06f786a50439c7c51359b23896e
-- &phi-3 ### START Phi-3
- url: "github:mudler/LocalAI/gallery/phi-3-chat.yaml@master"
+- &phi-3
+ url: "github:mudler/LocalAI/gallery/phi-3-chat.yaml@master" ### START Phi-3
name: "phi-3-mini-4k-instruct"
icon: https://avatars.githubusercontent.com/u/6154722
license: mit
@@ -10744,8 +10732,8 @@
- filename: Phi-3.5-MoE-instruct-Q4_K_M.gguf
sha256: 43e91bb720869bd8a92d8eb86bc3c74a52c49cf61642ca709b3d7bb89644df36
uri: huggingface://bartowski/Phi-3.5-MoE-instruct-GGUF/Phi-3.5-MoE-instruct-Q4_K_M.gguf
-- &hermes-2-pro-mistral ### START Hermes
- url: "github:mudler/LocalAI/gallery/hermes-2-pro-mistral.yaml@master"
+- &hermes-2-pro-mistral
+ url: "github:mudler/LocalAI/gallery/hermes-2-pro-mistral.yaml@master" ### START Hermes
name: "hermes-2-pro-mistral"
icon: https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ggO2sBDJ8Bhc6w-zwTx5j.png
license: apache-2.0
@@ -11080,8 +11068,8 @@
- filename: "galatolo-Q4_K.gguf"
sha256: "ca0cfd5a9ad40dc16416aa3a277015d0299b62c0803b67f5709580042202c172"
uri: "huggingface://galatolo/cerbero-7b-gguf/ggml-model-Q4_K.gguf"
-- &codellama ### START Codellama
- url: "github:mudler/LocalAI/gallery/codellama.yaml@master"
+- &codellama
+ url: "github:mudler/LocalAI/gallery/codellama.yaml@master" ### START Codellama
name: "codellama-7b"
license: llama2
description: |
@@ -11211,8 +11199,8 @@
- filename: "llm-compiler-7b-ftd.Q4_K.gguf"
uri: "huggingface://legraphista/llm-compiler-7b-ftd-IMat-GGUF/llm-compiler-7b-ftd.Q4_K.gguf"
sha256: d862dd18ed335413787d0ad196522a9902a3c10a6456afdab8721822cb0ddde8
-- &openvino ### START OpenVINO
- url: "github:mudler/LocalAI/gallery/openvino.yaml@master"
+- &openvino
+ url: "github:mudler/LocalAI/gallery/openvino.yaml@master" ### START OpenVINO
name: "openvino-llama-3-8b-instruct-ov-int8"
license: llama3
urls:
@@ -11325,8 +11313,8 @@
- gpu
- embedding
- cpu
-- &sentencentransformers ### START Embeddings
- description: |
+- &sentencentransformers
+ description: | ### START Embeddings
This framework provides an easy method to compute dense vector representations for sentences, paragraphs, and images. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various tasks. Text is embedded in vector space such that similar text are closer and can efficiently be found using cosine similarity.
urls:
- https://github.com/UKPLab/sentence-transformers
@@ -11340,8 +11328,8 @@
overrides:
parameters:
model: all-MiniLM-L6-v2
-- &dreamshaper ### START Image generation
- name: dreamshaper
+- &dreamshaper
+ name: dreamshaper ### START Image generation
icon: https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/dd9b038c-bd15-43ab-86ab-66e145ad7ff2/width=450/26072158-132340247-8k%20portrait%20of%20beautiful%20cyborg%20with%20brown%20hair,%20intricate,%20elegant,%20highly%20detailed,%20majestic,%20digital%20photography,%20art%20by%20artg_ed.jpeg
license: other
description: |
@@ -11538,8 +11526,8 @@
- filename: t5xxl_fp16.safetensors
sha256: 6e480b09fae049a72d2a8c5fbccb8d3e92febeb233bbe9dfe7256958a9167635
uri: https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors
-- &whisper ## Whisper
- url: "github:mudler/LocalAI/gallery/whisper-base.yaml@master"
+- &whisper
+ url: "github:mudler/LocalAI/gallery/whisper-base.yaml@master" ## Whisper
name: "whisper-1"
icon: https://avatars.githubusercontent.com/u/14957082
license: "MIT"
@@ -11720,8 +11708,8 @@
Stable Diffusion in NCNN with c++, supported txt2img and img2img
name: stablediffusion-cpp
icon: https://avatars.githubusercontent.com/u/100950301
-- &piper ## Piper TTS
- url: github:mudler/LocalAI/gallery/piper.yaml@master
+- &piper
+ url: github:mudler/LocalAI/gallery/piper.yaml@master ## Piper TTS
name: voice-en-us-kathleen-low
icon: https://github.com/rhasspy/piper/raw/master/etc/logo.png
license: mit
From af41436f1bf40fca937990ae6bede9dd3f6f0cd0 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 31 Jan 2025 09:57:58 +0100
Subject: [PATCH 119/679] fix(tests): pin to branch for config used in tests
(#4721)
Signed-off-by: Ettore Di Giacinto
---
core/config/backend_config_test.go | 4 ++--
core/http/app_test.go | 4 ++--
docs/content/docs/features/model-gallery.md | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/core/config/backend_config_test.go b/core/config/backend_config_test.go
index 04eacb7e..e6a54b89 100644
--- a/core/config/backend_config_test.go
+++ b/core/config/backend_config_test.go
@@ -48,9 +48,9 @@ parameters:
Expect(config.Name).To(Equal("bar-baz"))
Expect(config.Validate()).To(BeTrue())
- // download https://raw.githubusercontent.com/mudler/LocalAI/master/embedded/models/hermes-2-pro-mistral.yaml
+ // download https://raw.githubusercontent.com/mudler/LocalAI/v2.25.0/embedded/models/hermes-2-pro-mistral.yaml
httpClient := http.Client{}
- resp, err := httpClient.Get("https://raw.githubusercontent.com/mudler/LocalAI/master/embedded/models/hermes-2-pro-mistral.yaml")
+ resp, err := httpClient.Get("https://raw.githubusercontent.com/mudler/LocalAI/v2.25.0/embedded/models/hermes-2-pro-mistral.yaml")
Expect(err).To(BeNil())
defer resp.Body.Close()
tmp, err = os.CreateTemp("", "config.yaml")
diff --git a/core/http/app_test.go b/core/http/app_test.go
index f57a3ea7..bc4ecfae 100644
--- a/core/http/app_test.go
+++ b/core/http/app_test.go
@@ -476,7 +476,7 @@ var _ = Describe("API test", func() {
})
It("apply models from config", func() {
response := postModelApplyRequest("http://127.0.0.1:9090/models/apply", modelApplyRequest{
- ConfigURL: "https://raw.githubusercontent.com/mudler/LocalAI/master/embedded/models/hermes-2-pro-mistral.yaml",
+ ConfigURL: "https://raw.githubusercontent.com/mudler/LocalAI/v2.25.0/embedded/models/hermes-2-pro-mistral.yaml",
})
Expect(response["uuid"]).ToNot(BeEmpty(), fmt.Sprint(response))
@@ -600,7 +600,7 @@ var _ = Describe("API test", func() {
modelName := "hermes-2-pro-mistral"
response := postModelApplyRequest("http://127.0.0.1:9090/models/apply", modelApplyRequest{
- ConfigURL: "https://raw.githubusercontent.com/mudler/LocalAI/master/embedded/models/hermes-2-pro-mistral.yaml",
+ ConfigURL: "https://raw.githubusercontent.com/mudler/LocalAI/v2.25.0/embedded/models/hermes-2-pro-mistral.yaml",
})
Expect(response["uuid"]).ToNot(BeEmpty(), fmt.Sprint(response))
diff --git a/docs/content/docs/features/model-gallery.md b/docs/content/docs/features/model-gallery.md
index c17a5946..6943866a 100644
--- a/docs/content/docs/features/model-gallery.md
+++ b/docs/content/docs/features/model-gallery.md
@@ -134,12 +134,12 @@ curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
}'
```
-An example that installs openllama can be:
+An example that installs hermes-2-pro-mistral can be:
```bash
LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
- "config_url": "https://raw.githubusercontent.com/mudler/LocalAI/master/embedded/models/hermes-2-pro-mistral.yaml"
+ "config_url": "https://raw.githubusercontent.com/mudler/LocalAI/v2.25.0/embedded/models/hermes-2-pro-mistral.yaml"
}'
```
From 7badaf78a0e283a6dc259fd204ba8b76b9f53dc7 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Fri, 31 Jan 2025 12:31:46 +0100
Subject: [PATCH 120/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`8b576b6c55bc4e6be898b47522f0ef402b93ef62` (#4722)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 5b903d7d..0f91a5db 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=eb7cf15a808d4d7a71eef89cc6a9b96fe82989dc
+CPPLLAMA_VERSION?=8b576b6c55bc4e6be898b47522f0ef402b93ef62
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From ff07612bfa504bc25faf6c34bb901b7c9409509c Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 31 Jan 2025 14:45:42 +0100
Subject: [PATCH 121/679] chore(model gallery): add
mistral-small-24b-instruct-2501 (#4725)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 98c3a782..f509d343 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -6613,6 +6613,23 @@
- filename: Wayfarer-12B-Q4_K_M.gguf
sha256: 6cd9f290c820c64854fcdcfd312b066447acc2f63abe2e2e71af9bc4f1946c08
uri: huggingface://bartowski/Wayfarer-12B-GGUF/Wayfarer-12B-Q4_K_M.gguf
+- !!merge <<: *mistral03
+ name: "mistral-small-24b-instruct-2501"
+ urls:
+ - https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501
+ - https://huggingface.co/bartowski/Mistral-Small-24B-Instruct-2501-GGUF
+ description: |
+ Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models!
+ This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501.
+
+ Mistral Small can be deployed locally and is exceptionally "knowledge-dense", fitting in a single RTX 4090 or a 32GB RAM MacBook once quantized.
+ overrides:
+ parameters:
+ model: Mistral-Small-24B-Instruct-2501-Q4_K_M.gguf
+ files:
+ - filename: Mistral-Small-24B-Instruct-2501-Q4_K_M.gguf
+ sha256: d1a6d049f09730c3f8ba26cf6b0b60c89790b5fdafa9a59c819acdfe93fffd1b
+ uri: huggingface://bartowski/Mistral-Small-24B-Instruct-2501-GGUF/Mistral-Small-24B-Instruct-2501-Q4_K_M.gguf
- &mudler
url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
name: "LocalAI-llama3-8b-function-call-v0.2"
From e0d90b173b5af15386c96f450822fdb3617b1c4e Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 31 Jan 2025 14:49:02 +0100
Subject: [PATCH 122/679] chore(model gallery): add tinyswallow-1.5b-instruct
(#4726)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index f509d343..e9200537 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3462,6 +3462,20 @@
- filename: OpenThinker-7B-Q4_K_M.gguf
sha256: 94dff1a7acd685db5cff7afdb837aab8172e06d65fe6179ba47428e3030acd93
uri: huggingface://bartowski/OpenThinker-7B-GGUF/OpenThinker-7B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "tinyswallow-1.5b-instruct"
+ urls:
+ - https://huggingface.co/SakanaAI/TinySwallow-1.5B-Instruct
+ - https://huggingface.co/bartowski/TinySwallow-1.5B-Instruct-GGUF
+ description: |
+ TinySwallow-1.5B-Instruct is an instruction-tuned version of TinySwallow-1.5B, created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method. We used Qwen2.5-32B-Instruct as the teacher model and Qwen2.5-1.5B-Instruct as the student model. The model has been further instruction-tuned to enhance its ability to follow instructions and engage in conversations in Japanese.
+ overrides:
+ parameters:
+ model: TinySwallow-1.5B-Instruct-Q4_K_M.gguf
+ files:
+ - filename: TinySwallow-1.5B-Instruct-Q4_K_M.gguf
+ sha256: 4d409c8873c1650a19c0a7a1c051e342613191a487768fe0d29735b9361079cd
+ uri: huggingface://bartowski/TinySwallow-1.5B-Instruct-GGUF/TinySwallow-1.5B-Instruct-Q4_K_M.gguf
- &llama31
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1
icon: https://avatars.githubusercontent.com/u/153379578
From f1763aabf22da70552e1bc0a100444ba0b64496e Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 31 Jan 2025 14:53:39 +0100
Subject: [PATCH 123/679] chore(model gallery): add taid-llm-1.5b (#4727)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index e9200537..c6d2ba61 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5995,6 +5995,21 @@
- filename: minicpm-v-2_6-mmproj-f16.gguf
uri: huggingface://openbmb/MiniCPM-V-2_6-gguf/mmproj-model-f16.gguf
sha256: 4485f68a0f1aa404c391e788ea88ea653c100d8e98fe572698f701e5809711fd
+- !!merge <<: *qwen2
+ name: "taid-llm-1.5b"
+ icon: https://sakana.ai/assets/taid-jp/cover_large.jpeg
+ urls:
+ - https://huggingface.co/SakanaAI/TAID-LLM-1.5B
+ - https://huggingface.co/bartowski/TAID-LLM-1.5B-GGUF
+ description: |
+ TAID-LLM-1.5B is an English language model created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method. We used Qwen2-72B-Instruct as the teacher model and Qwen2-1.5B-Instruct as the student model.
+ overrides:
+ parameters:
+ model: TAID-LLM-1.5B-Q4_K_M.gguf
+ files:
+ - filename: TAID-LLM-1.5B-Q4_K_M.gguf
+ sha256: dbffc989d12d42ef8e4a2994e102d7ec7a02c49ec08ea2e35426372ad07b4cd8
+ uri: huggingface://bartowski/TAID-LLM-1.5B-GGUF/TAID-LLM-1.5B-Q4_K_M.gguf
- &mistral03
url: "github:mudler/LocalAI/gallery/mistral-0.3.yaml@master" ## START Mistral
name: "mistral-7b-instruct-v0.3"
From 732042e5c66ab077f515805c44615dcbe26189ef Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Fri, 31 Jan 2025 23:31:00 +0100
Subject: [PATCH 124/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`aa6fb1321333fae8853d0cdc26bcb5d438e650a1` (#4728)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 0f91a5db..ac32a37b 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=8b576b6c55bc4e6be898b47522f0ef402b93ef62
+CPPLLAMA_VERSION?=aa6fb1321333fae8853d0cdc26bcb5d438e650a1
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From ba2f426e3e03615a73f612ba2c21e87923d4cad1 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 1 Feb 2025 10:12:15 +0100
Subject: [PATCH 125/679] chore(model gallery): add
fuseo1-deekseekr1-qwq-skyt1-32b-preview (#4731)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index c6d2ba61..a3f90ca4 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5609,6 +5609,20 @@
- filename: FuseO1-DeepSeekR1-QwQ-32B-Preview-Q4_K_M.gguf
sha256: 16f1fb6bf76bb971a7a63e1a68cddd09421f4a767b86eec55eed1e08178f78f2
uri: huggingface://bartowski/FuseO1-DeepSeekR1-QwQ-32B-Preview-GGUF/FuseO1-DeepSeekR1-QwQ-32B-Preview-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "fuseo1-deekseekr1-qwq-skyt1-32b-preview"
+ urls:
+ - https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview
+ - https://huggingface.co/bartowski/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview-GGUF
+ description: |
+ FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced SCE merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
+ overrides:
+ parameters:
+ model: FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview-Q4_K_M.gguf
+ files:
+ - filename: FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview-Q4_K_M.gguf
+ sha256: 13911dd4a62d4714a3447bc288ea9d49dbe575a91cab9e8f645057f1d8e1100e
+ uri: huggingface://bartowski/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview-GGUF/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview-Q4_K_M.gguf
- &qwen2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## Start QWEN2
name: "qwen2-7b-instruct"
From d79f02ea0953644ef8bf1c422765a7d7a7c15c6d Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 1 Feb 2025 22:45:26 +0100
Subject: [PATCH 126/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`53debe6f3c9cca87e9520a83ee8c14d88977afa4` (#4732)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index ac32a37b..b97a8940 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=aa6fb1321333fae8853d0cdc26bcb5d438e650a1
+CPPLLAMA_VERSION?=53debe6f3c9cca87e9520a83ee8c14d88977afa4
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 1d6afbd65d24b46c74f71f4b593f359efb54bae3 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sun, 2 Feb 2025 13:25:03 +0100
Subject: [PATCH 127/679] feat(llama.cpp): Add support to grammar triggers
(#4733)
Signed-off-by: Ettore Di Giacinto
---
backend/backend.proto | 7 +++++++
backend/cpp/llama/grpc-server.cpp | 20 ++++++++++++++++++++
core/backend/options.go | 10 ++++++++++
pkg/functions/parse.go | 10 +++++++++-
4 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/backend/backend.proto b/backend/backend.proto
index fea4214f..bd75adc5 100644
--- a/backend/backend.proto
+++ b/backend/backend.proto
@@ -163,6 +163,11 @@ message Reply {
double timing_token_generation = 5;
}
+message GrammarTrigger {
+ string word = 1;
+ bool at_start = 2;
+}
+
message ModelOptions {
string Model = 1;
int32 ContextSize = 2;
@@ -247,6 +252,8 @@ message ModelOptions {
string CacheTypeKey = 63;
string CacheTypeValue = 64;
+
+ repeated GrammarTrigger GrammarTriggers = 65;
}
message Result {
diff --git a/backend/cpp/llama/grpc-server.cpp b/backend/cpp/llama/grpc-server.cpp
index 9aeb34db..1e9a3551 100644
--- a/backend/cpp/llama/grpc-server.cpp
+++ b/backend/cpp/llama/grpc-server.cpp
@@ -468,6 +468,9 @@ struct llama_server_context
bool add_bos_token = true;
bool has_eos_token = true;
+ bool grammar_lazy = false;
+ std::vector grammar_trigger_words;
+
int32_t n_ctx; // total context for all clients / slots
// system prompt
@@ -706,6 +709,8 @@ struct llama_server_context
slot->sparams.grammar = json_value(data, "grammar", default_sparams.grammar);
slot->sparams.n_probs = json_value(data, "n_probs", default_sparams.n_probs);
slot->sparams.min_keep = json_value(data, "min_keep", default_sparams.min_keep);
+ slot->sparams.grammar_trigger_words = grammar_trigger_words;
+ slot->sparams.grammar_lazy = grammar_lazy;
if (slot->n_predict > 0 && slot->params.n_predict > slot->n_predict) {
// Might be better to reject the request with a 400 ?
@@ -2374,6 +2379,21 @@ static void params_parse(const backend::ModelOptions* request,
if ( request->ropefreqscale() != 0.0f ) {
params.rope_freq_scale = request->ropefreqscale();
}
+
+ if (request->grammartriggers_size() > 0) {
+ LOG_INFO("configuring grammar triggers", {});
+ llama.grammar_lazy = true;
+ for (int i = 0; i < request->grammartriggers_size(); i++) {
+ common_grammar_trigger trigger;
+ trigger.word = request->grammartriggers(i).word();
+ trigger.at_start = request->grammartriggers(i).at_start();
+ llama.grammar_trigger_words.push_back(trigger);
+ LOG_INFO("grammar trigger", {
+ { "word", trigger.word },
+ { "at_start", trigger.at_start }
+ });
+ }
+ }
}
diff --git a/core/backend/options.go b/core/backend/options.go
index 92a42893..3201142d 100644
--- a/core/backend/options.go
+++ b/core/backend/options.go
@@ -118,9 +118,19 @@ func grpcModelOpts(c config.BackendConfig) *pb.ModelOptions {
nGPULayers = *c.NGPULayers
}
+ triggers := make([]*pb.GrammarTrigger, 0)
+ for _, t := range c.FunctionsConfig.GrammarConfig.GrammarTriggers {
+ triggers = append(triggers, &pb.GrammarTrigger{
+ Word: t.Word,
+ AtStart: t.AtStart,
+ })
+
+ }
+
return &pb.ModelOptions{
CUDA: c.CUDA || c.Diffusers.CUDA,
SchedulerType: c.Diffusers.SchedulerType,
+ GrammarTriggers: triggers,
PipelineType: c.Diffusers.PipelineType,
CFGScale: c.CFGScale,
LoraAdapter: c.LoraAdapter,
diff --git a/pkg/functions/parse.go b/pkg/functions/parse.go
index 50cbb27b..30338ffd 100644
--- a/pkg/functions/parse.go
+++ b/pkg/functions/parse.go
@@ -47,6 +47,14 @@ type GrammarConfig struct {
// SchemaType can be configured to use a specific schema type to force the grammar
// available : json, llama3.1
SchemaType string `yaml:"schema_type"`
+
+ GrammarTriggers []GrammarTrigger `yaml:"triggers"`
+}
+
+type GrammarTrigger struct {
+ // Trigger is the string that triggers the grammar
+ Word string `yaml:"word"`
+ AtStart bool `yaml:"at_start"`
}
// FunctionsConfig is the configuration for the tool/function call.
@@ -361,6 +369,6 @@ func ParseFunctionCallArgs(functionArguments string, functionConfig FunctionsCon
}
jsonBytes, _ := json.Marshal(args)
-
+
return string(jsonBytes)
}
From 03974a4dd456d83f51ccccf6aef486cda71741ce Mon Sep 17 00:00:00 2001
From: Shraddha
Date: Sun, 2 Feb 2025 23:09:43 +0530
Subject: [PATCH 128/679] feat: tokenization with llama.cpp (#4724)
feat: tokenization
Signed-off-by: shraddhazpy
---
backend/cpp/llama/grpc-server.cpp | 12 ++++++++++++
core/backend/tokenize.go | 11 +++++------
core/http/endpoints/localai/tokenize.go | 5 ++---
3 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/backend/cpp/llama/grpc-server.cpp b/backend/cpp/llama/grpc-server.cpp
index 1e9a3551..4daf84c6 100644
--- a/backend/cpp/llama/grpc-server.cpp
+++ b/backend/cpp/llama/grpc-server.cpp
@@ -2542,6 +2542,18 @@ public:
return grpc::Status::OK;
}
+ grpc::Status TokenizeString(ServerContext* context, const backend::PredictOptions* request, backend::TokenizationResponse* response){
+ json data = parse_options(false, request, llama);
+
+ std::vector tokens = llama.tokenize(data["prompt"],false);
+
+ for (int i=0 ; i< tokens.size(); i++){
+ response->add_tokens(tokens[i]);
+ }
+
+ return grpc::Status::OK;
+ }
+
grpc::Status GetMetrics(ServerContext* context, const backend::MetricsRequest* request, backend::MetricsResponse* response) {
llama_client_slot* active_slot = llama.get_active_slot();
diff --git a/core/backend/tokenize.go b/core/backend/tokenize.go
index 2f813e18..1783083b 100644
--- a/core/backend/tokenize.go
+++ b/core/backend/tokenize.go
@@ -16,12 +16,7 @@ func ModelTokenize(s string, loader *model.ModelLoader, backendConfig config.Bac
opts := ModelOptions(backendConfig, appConfig, model.WithModel(modelFile))
- if backendConfig.Backend == "" {
- inferenceModel, err = loader.Load(opts...)
- } else {
- opts = append(opts, model.WithBackendString(backendConfig.Backend))
- inferenceModel, err = loader.Load(opts...)
- }
+ inferenceModel, err = loader.Load(opts...)
if err != nil {
return schema.TokenizeResponse{}, err
}
@@ -35,6 +30,10 @@ func ModelTokenize(s string, loader *model.ModelLoader, backendConfig config.Bac
return schema.TokenizeResponse{}, err
}
+ if resp.Tokens == nil {
+ resp.Tokens = make([]int32, 0)
+ }
+
return schema.TokenizeResponse{
Tokens: resp.Tokens,
}, nil
diff --git a/core/http/endpoints/localai/tokenize.go b/core/http/endpoints/localai/tokenize.go
index da110bf8..faa8a0a4 100644
--- a/core/http/endpoints/localai/tokenize.go
+++ b/core/http/endpoints/localai/tokenize.go
@@ -12,6 +12,7 @@ import (
// TokenizeEndpoint exposes a REST API to tokenize the content
// @Summary Tokenize the input.
+// @Param request body schema.TokenizeRequest true "Request"
// @Success 200 {object} schema.TokenizeResponse "Response"
// @Router /v1/tokenize [post]
func TokenizeEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
@@ -51,8 +52,6 @@ func TokenizeEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, app
return err
}
- c.JSON(tokenResponse)
- return nil
-
+ return c.JSON(tokenResponse)
}
}
From a37fa8d9c44bffa5df6bf442c5c8a54a639bcef3 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sun, 2 Feb 2025 23:18:30 +0100
Subject: [PATCH 129/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`90f9b88afb6447d3929843a2aa98c0f11074762d` (#4736)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index b97a8940..3e9446b4 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=53debe6f3c9cca87e9520a83ee8c14d88977afa4
+CPPLLAMA_VERSION?=90f9b88afb6447d3929843a2aa98c0f11074762d
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 52fadeded128c4e06bd7b72d4b64db7e58089cd3 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Mon, 3 Feb 2025 10:16:42 +0100
Subject: [PATCH 130/679] feat(swagger): update swagger (#4735)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
swagger/docs.go | 22 ++++++++++++++++++++++
swagger/swagger.json | 22 ++++++++++++++++++++++
swagger/swagger.yaml | 14 ++++++++++++++
3 files changed, 58 insertions(+)
diff --git a/swagger/docs.go b/swagger/docs.go
index 43bc8822..f1050e85 100644
--- a/swagger/docs.go
+++ b/swagger/docs.go
@@ -765,6 +765,17 @@ const docTemplate = `{
"/v1/tokenize": {
"post": {
"summary": "Tokenize the input.",
+ "parameters": [
+ {
+ "description": "Request",
+ "name": "request",
+ "in": "body",
+ "required": true,
+ "schema": {
+ "$ref": "#/definitions/schema.TokenizeRequest"
+ }
+ }
+ ],
"responses": {
"200": {
"description": "Response",
@@ -1838,6 +1849,17 @@ const docTemplate = `{
}
}
},
+ "schema.TokenizeRequest": {
+ "type": "object",
+ "properties": {
+ "content": {
+ "type": "string"
+ },
+ "model": {
+ "type": "string"
+ }
+ }
+ },
"schema.TokenizeResponse": {
"type": "object",
"properties": {
diff --git a/swagger/swagger.json b/swagger/swagger.json
index 7d39e5e9..b2d02ea2 100644
--- a/swagger/swagger.json
+++ b/swagger/swagger.json
@@ -758,6 +758,17 @@
"/v1/tokenize": {
"post": {
"summary": "Tokenize the input.",
+ "parameters": [
+ {
+ "description": "Request",
+ "name": "request",
+ "in": "body",
+ "required": true,
+ "schema": {
+ "$ref": "#/definitions/schema.TokenizeRequest"
+ }
+ }
+ ],
"responses": {
"200": {
"description": "Response",
@@ -1831,6 +1842,17 @@
}
}
},
+ "schema.TokenizeRequest": {
+ "type": "object",
+ "properties": {
+ "content": {
+ "type": "string"
+ },
+ "model": {
+ "type": "string"
+ }
+ }
+ },
"schema.TokenizeResponse": {
"type": "object",
"properties": {
diff --git a/swagger/swagger.yaml b/swagger/swagger.yaml
index e747464f..e7b9e625 100644
--- a/swagger/swagger.yaml
+++ b/swagger/swagger.yaml
@@ -705,6 +705,13 @@ definitions:
description: voice audio file or speaker id
type: string
type: object
+ schema.TokenizeRequest:
+ properties:
+ content:
+ type: string
+ model:
+ type: string
+ type: object
schema.TokenizeResponse:
properties:
tokens:
@@ -1216,6 +1223,13 @@ paths:
summary: Get TokenMetrics for Active Slot.
/v1/tokenize:
post:
+ parameters:
+ - description: Request
+ in: body
+ name: request
+ required: true
+ schema:
+ $ref: '#/definitions/schema.TokenizeRequest'
responses:
"200":
description: Response
From ed0094c3d05c9e598d6b1c324115304e5bb4569f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 3 Feb 2025 10:30:07 +0100
Subject: [PATCH 131/679] chore(model gallery): add steelskull_l3.3-damascus-r1
(#4737)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index a3f90ca4..dfa328f9 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5623,6 +5623,35 @@
- filename: FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview-Q4_K_M.gguf
sha256: 13911dd4a62d4714a3447bc288ea9d49dbe575a91cab9e8f645057f1d8e1100e
uri: huggingface://bartowski/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview-GGUF/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview-Q4_K_M.gguf
+- !!merge <<: *deepseek-r1
+ name: "steelskull_l3.3-damascus-r1"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/64545af5ec40bbbd01242ca6/iIzpqHDb9wU181AzfrjZy.png
+ urls:
+ - https://huggingface.co/Steelskull/L3.3-Damascus-R1
+ - https://huggingface.co/bartowski/Steelskull_L3.3-Damascus-R1-GGUF
+ description: |
+ Damascus-R1 builds upon some elements of the Nevoria foundation but represents a significant step forward with a completely custom-made DeepSeek R1 Distill base: Hydroblated-R1-V3. Constructed using the new SCE (Select, Calculate, and Erase) merge method, Damascus-R1 prioritizes stability, intelligence, and enhanced awareness.
+
+ Technical Architecture
+ Leveraging the SCE merge method and custom base, Damascus-R1 integrates newly added specialized components from multiple high-performance models:
+ EVA and EURYALE foundations for creative expression and scene comprehension
+ Cirrus and Hanami elements for enhanced reasoning capabilities
+ Anubis components for detailed scene description
+ Negative_LLAMA integration for balanced perspective and response
+
+ Core Philosophy
+ Damascus-R1 embodies the principle that AI models can be intelligent and be fun. This version specifically addresses recent community feedback and iterates on prior experiments, optimizing the balance between technical capability and natural conversation flow.
+
+ Base Architecture
+ At its core, Damascus-R1 utilizes the entirely custom Hydroblated-R1 base model, specifically engineered for stability, enhanced reasoning, and performance. The SCE merge method, with settings finely tuned based on community feedback from evaluations of Experiment-Model-Ver-A, L3.3-Exp-Nevoria-R1-70b-v0.1 and L3.3-Exp-Nevoria-70b-v0.1, enables precise and effective component integration while maintaining model coherence and reliability.
+ overrides:
+ parameters:
+ model: Steelskull_L3.3-Damascus-R1-Q4_K_M.gguf
+ files:
+ - filename: Steelskull_L3.3-Damascus-R1-Q4_K_M.gguf
+ sha256: f1df5808b2099b26631d0bae870603a08dbfab6813471f514035d3fb92a47480
+ uri: huggingface://bartowski/Steelskull_L3.3-Damascus-R1-GGUF/Steelskull_L3.3-Damascus-R1-Q4_K_M.gguf
+
- &qwen2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## Start QWEN2
name: "qwen2-7b-instruct"
From 41a2dfb0d9abe0a9a7bd8139e38c4847ac64f42f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 3 Feb 2025 10:37:24 +0100
Subject: [PATCH 132/679] chore(model gallery): add
thedrummer_gemmasutra-pro-27b-v1.1 (#4738)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index dfa328f9..f3ce76da 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -7518,6 +7518,21 @@
- filename: GWQ-9B-Preview2-Q4_K_M.gguf
sha256: 04da51cdb17c7e51594f6daac595161a46298b48ab5e568a85e65541d10a861f
uri: huggingface://bartowski/GWQ-9B-Preview2-GGUF/GWQ-9B-Preview2-Q4_K_M.gguf
+- !!merge <<: *gemma
+ name: "thedrummer_gemmasutra-pro-27b-v1.1"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/65f2fd1c25b848bd061b5c2e/SrHUGXD_dp55pobeJK36t.png
+ urls:
+ - https://huggingface.co/TheDrummer/Gemmasutra-Pro-27B-v1.1
+ - https://huggingface.co/bartowski/TheDrummer_Gemmasutra-Pro-27B-v1.1-GGUF
+ description: |
+ A Gemmasutra tune with modern techniques. Au Revoir, Gemma!
+ overrides:
+ parameters:
+ model: TheDrummer_Gemmasutra-Pro-27B-v1.1-Q4_K_M.gguf
+ files:
+ - filename: TheDrummer_Gemmasutra-Pro-27B-v1.1-Q4_K_M.gguf
+ sha256: 218a14f0bf8266f9e77d16b8b4f5cc1dc76e97eb582a2c97cca5a3a2c35de86b
+ uri: huggingface://bartowski/TheDrummer_Gemmasutra-Pro-27B-v1.1-GGUF/TheDrummer_Gemmasutra-Pro-27B-v1.1-Q4_K_M.gguf
- &llama3
url: "github:mudler/LocalAI/gallery/llama3-instruct.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
From 051faaf771c17fd37fd19d999e160f8a293ae481 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 3 Feb 2025 10:46:47 +0100
Subject: [PATCH 133/679] chore(model gallery): add
uncensoredai_uncensoredlm-deepseek-r1-distill-qwen-14b (#4739)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index f3ce76da..c71e7425 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -5651,7 +5651,21 @@
- filename: Steelskull_L3.3-Damascus-R1-Q4_K_M.gguf
sha256: f1df5808b2099b26631d0bae870603a08dbfab6813471f514035d3fb92a47480
uri: huggingface://bartowski/Steelskull_L3.3-Damascus-R1-GGUF/Steelskull_L3.3-Damascus-R1-Q4_K_M.gguf
-
+- !!merge <<: *deepseek-r1
+ name: "uncensoredai_uncensoredlm-deepseek-r1-distill-qwen-14b"
+ icon: https://huggingface.co/uncensoredai/UncensoredLM-DeepSeek-R1-Distill-Qwen-14B/resolve/main/h5dTflRHYMbGq3RXm9a61yz4io.avif
+ urls:
+ - https://huggingface.co/uncensoredai/UncensoredLM-DeepSeek-R1-Distill-Qwen-14B
+ - https://huggingface.co/bartowski/uncensoredai_UncensoredLM-DeepSeek-R1-Distill-Qwen-14B-GGUF
+ description: |
+ An UncensoredLLM with Reasoning, what more could you want?
+ overrides:
+ parameters:
+ model: uncensoredai_UncensoredLM-DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf
+ files:
+ - filename: uncensoredai_UncensoredLM-DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf
+ sha256: 85b2c3e1aa4e8cc3bf616f84c7595c963d5439f3fcfdbd5c957fb22e84d10b1c
+ uri: huggingface://bartowski/uncensoredai_UncensoredLM-DeepSeek-R1-Distill-Qwen-14B-GGUF/uncensoredai_UncensoredLM-DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf
- &qwen2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## Start QWEN2
name: "qwen2-7b-instruct"
From d290fd159f7e41e2de75fe885bf1efd12ab5a88c Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 3 Feb 2025 15:55:49 +0100
Subject: [PATCH 134/679] chore(model gallery): add
LocalAI-functioncall-llama3.2-1b-v0.4 (#4740)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 17 ++++++++++++-
gallery/llama3.2-fcall.yaml | 48 +++++++++++++++++++++++++++++++++++++
2 files changed, 64 insertions(+), 1 deletion(-)
create mode 100644 gallery/llama3.2-fcall.yaml
diff --git a/gallery/index.yaml b/gallery/index.yaml
index c71e7425..24b4d65f 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -853,7 +853,7 @@
sha256: bac8e8c1d1d9d53cbdb148b8ff9ad378ddb392429207099e85b5aae3a43bff3d
uri: huggingface://cstr/salamandra-7b-instruct-GGUF/salamandra-7b-instruct.Q4_K_M-f32.gguf
- &llama32
- url: "github:mudler/LocalAI/gallery/llama3.2-quantized.yaml@master" ## llama3.2
+ url: "github:mudler/LocalAI/gallery/llama3.2-quantized.yaml@master"
icon: https://avatars.githubusercontent.com/u/153379578
license: llama3.2
description: |
@@ -1383,6 +1383,21 @@
- filename: FineMath-Llama-3B-Q4_K_M.gguf
sha256: 16c73b5cf2a417a7e1608bcc9469f1461fc3e759ce04a3a337f48df977dc158c
uri: huggingface://bartowski/FineMath-Llama-3B-GGUF/FineMath-Llama-3B-Q4_K_M.gguf
+- !!merge <<: *llama32
+ name: "LocalAI-functioncall-llama3.2-1b-v0.4"
+ url: "github:mudler/LocalAI/gallery/llama3.2-fcall.yaml@master"
+ urls:
+ - https://huggingface.co/mudler/LocalAI-functioncall-llama3.2-1b-v0.4
+ - https://huggingface.co/mradermacher/LocalAI-functioncall-llama3.2-1b-v0.4-GGUF
+ description: |
+ A model tailored to be conversational and execute function calls with LocalAI. This model is based on llama 3.2 and has 1B parameter. Perfect for small devices.
+ overrides:
+ parameters:
+ model: LocalAI-functioncall-llama3.2-1b-v0.4.Q8_0.gguf
+ files:
+ - filename: LocalAI-functioncall-llama3.2-1b-v0.4.Q8_0.gguf
+ sha256: 547e57c2d3f17c632c9fd303afdb00446e7396df453aee62633b76976c407616
+ uri: huggingface://mradermacher/LocalAI-functioncall-llama3.2-1b-v0.4-GGUF/LocalAI-functioncall-llama3.2-1b-v0.4.Q8_0.gguf
- &qwen25
name: "qwen2.5-14b-instruct" ## Qwen2.5
icon: https://avatars.githubusercontent.com/u/141221163
diff --git a/gallery/llama3.2-fcall.yaml b/gallery/llama3.2-fcall.yaml
new file mode 100644
index 00000000..0188045e
--- /dev/null
+++ b/gallery/llama3.2-fcall.yaml
@@ -0,0 +1,48 @@
+---
+name: "llama3.2-fcall"
+
+config_file: |
+ mmap: true
+ function:
+ json_regex_match:
+ - "(?s)"
+ capture_llm_results:
+ - (?s)(.*?)
+ replace_llm_results:
+ - key: (?s)(.*?)
+ value: ""
+ grammar:
+ properties_order: "name,arguments"
+ template:
+ chat: |
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
+ You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>
+ {{.Input }}
+ <|start_header_id|>assistant<|end_header_id|>
+ chat_message: |
+ <|start_header_id|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}<|end_header_id|>
+ {{ if .FunctionCall -}}
+ {{ else if eq .RoleName "tool" -}}
+ {{ end -}}
+ {{ if .Content -}}
+ {{.Content -}}
+ {{ else if .FunctionCall -}}
+ {{ toJson .FunctionCall -}}
+ {{ end -}}
+ <|eot_id|>
+ completion: |
+ {{.Input}}
+ function: |
+ <|start_header_id|>system<|end_header_id|>
+ You are an AI assistant that executes function calls, and these are the tools at your disposal:
+ {{range .Functions}}
+ {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
+ {{end}}
+ <|eot_id|>{{.Input}}<|start_header_id|>assistant<|end_header_id|>
+ context_size: 8192
+ f16: true
+ stopwords:
+ - <|im_end|>
+ -
+ - "<|eot_id|>"
+ - <|end_of_text|>
From 431716d4d6e8b3529c3cfa5277e9bcd79964daa6 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 3 Feb 2025 16:10:33 +0100
Subject: [PATCH 135/679] fix(gallery): remove box token to llama3.2-fcall
Signed-off-by: Ettore Di Giacinto
---
gallery/llama3.2-fcall.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gallery/llama3.2-fcall.yaml b/gallery/llama3.2-fcall.yaml
index 0188045e..5b0a53a1 100644
--- a/gallery/llama3.2-fcall.yaml
+++ b/gallery/llama3.2-fcall.yaml
@@ -15,7 +15,7 @@ config_file: |
properties_order: "name,arguments"
template:
chat: |
- <|begin_of_text|><|start_header_id|>system<|end_header_id|>
+ <|start_header_id|>system<|end_header_id|>
You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>
{{.Input }}
<|start_header_id|>assistant<|end_header_id|>
From c3c27b7e3d98a782a2f2d443a45ec7e41e2670f4 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Mon, 3 Feb 2025 17:58:57 +0100
Subject: [PATCH 136/679] chore(model gallery): small fixups to llama3.2-fcall
template
Signed-off-by: Ettore Di Giacinto
---
gallery/llama3.2-fcall.yaml | 1 +
1 file changed, 1 insertion(+)
diff --git a/gallery/llama3.2-fcall.yaml b/gallery/llama3.2-fcall.yaml
index 5b0a53a1..73f370a8 100644
--- a/gallery/llama3.2-fcall.yaml
+++ b/gallery/llama3.2-fcall.yaml
@@ -13,6 +13,7 @@ config_file: |
value: ""
grammar:
properties_order: "name,arguments"
+ function_arguments_key: "arguments"
template:
chat: |
<|start_header_id|>system<|end_header_id|>
From df30d6a4824789ead1898bdcf59f9e1d31c2e1ed Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 3 Feb 2025 22:21:40 +0000
Subject: [PATCH 137/679] chore(deps): Bump GrantBirki/git-diff-action from
2.7.0 to 2.8.0 (#4746)
Bumps [GrantBirki/git-diff-action](https://github.com/grantbirki/git-diff-action) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/grantbirki/git-diff-action/releases)
- [Commits](https://github.com/grantbirki/git-diff-action/compare/v2.7.0...v2.8.0)
---
updated-dependencies:
- dependency-name: GrantBirki/git-diff-action
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
.github/workflows/notify-models.yaml | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/notify-models.yaml b/.github/workflows/notify-models.yaml
index e17ee7fc..b84e10e0 100644
--- a/.github/workflows/notify-models.yaml
+++ b/.github/workflows/notify-models.yaml
@@ -18,7 +18,7 @@ jobs:
with:
model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface:///file"
# Check the PR diff using the current branch and the base branch of the PR
- - uses: GrantBirki/git-diff-action@v2.7.0
+ - uses: GrantBirki/git-diff-action@v2.8.0
id: git-diff-action
with:
json_diff_file_output: diff.json
@@ -99,7 +99,7 @@ jobs:
docker run -e -ti -d --name local-ai -p 8080:8080 localai/localai:master-ffmpeg-core run --debug $MODEL_NAME
until [ "`docker inspect -f {{.State.Health.Status}} local-ai`" == "healthy" ]; do echo "Waiting for container to be ready"; docker logs --tail 10 local-ai; sleep 2; done
# Check the PR diff using the current branch and the base branch of the PR
- - uses: GrantBirki/git-diff-action@v2.7.0
+ - uses: GrantBirki/git-diff-action@v2.8.0
id: git-diff-action
with:
json_diff_file_output: diff.json
From e3b943ffcb798dce642d5edb576d7cb8647d4f7f Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Tue, 4 Feb 2025 08:56:11 +0100
Subject: [PATCH 138/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`5598f475be3e31430fbe17ebb85654ec90dc201e` (#4757)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 3e9446b4..576a480b 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=90f9b88afb6447d3929843a2aa98c0f11074762d
+CPPLLAMA_VERSION?=5598f475be3e31430fbe17ebb85654ec90dc201e
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 5a19094d3a7ac310b424eeba30d13764f96ab36b Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 4 Feb 2025 08:56:51 +0100
Subject: [PATCH 139/679] chore(deps): Bump sentence-transformers from 3.4.0 to
3.4.1 in /backend/python/transformers (#4748)
chore(deps): Bump sentence-transformers in /backend/python/transformers
Bumps [sentence-transformers](https://github.com/UKPLab/sentence-transformers) from 3.4.0 to 3.4.1.
- [Release notes](https://github.com/UKPLab/sentence-transformers/releases)
- [Commits](https://github.com/UKPLab/sentence-transformers/compare/v3.4.0...v3.4.1)
---
updated-dependencies:
- dependency-name: sentence-transformers
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
backend/python/transformers/requirements-cpu.txt | 2 +-
backend/python/transformers/requirements-cublas11.txt | 2 +-
backend/python/transformers/requirements-cublas12.txt | 2 +-
backend/python/transformers/requirements-hipblas.txt | 2 +-
backend/python/transformers/requirements-intel.txt | 2 +-
5 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/backend/python/transformers/requirements-cpu.txt b/backend/python/transformers/requirements-cpu.txt
index 36dc973a..79863c2b 100644
--- a/backend/python/transformers/requirements-cpu.txt
+++ b/backend/python/transformers/requirements-cpu.txt
@@ -5,4 +5,4 @@ accelerate
transformers
bitsandbytes
outetts
-sentence-transformers==3.4.0
\ No newline at end of file
+sentence-transformers==3.4.1
\ No newline at end of file
diff --git a/backend/python/transformers/requirements-cublas11.txt b/backend/python/transformers/requirements-cublas11.txt
index a8b1c0c0..fa9f8953 100644
--- a/backend/python/transformers/requirements-cublas11.txt
+++ b/backend/python/transformers/requirements-cublas11.txt
@@ -6,4 +6,4 @@ accelerate
transformers
bitsandbytes
outetts
-sentence-transformers==3.4.0
+sentence-transformers==3.4.1
diff --git a/backend/python/transformers/requirements-cublas12.txt b/backend/python/transformers/requirements-cublas12.txt
index a54c4c88..127bfb21 100644
--- a/backend/python/transformers/requirements-cublas12.txt
+++ b/backend/python/transformers/requirements-cublas12.txt
@@ -5,4 +5,4 @@ numba==0.60.0
transformers
bitsandbytes
outetts
-sentence-transformers==3.4.0
+sentence-transformers==3.4.1
diff --git a/backend/python/transformers/requirements-hipblas.txt b/backend/python/transformers/requirements-hipblas.txt
index 73b7d85b..c0ca93ee 100644
--- a/backend/python/transformers/requirements-hipblas.txt
+++ b/backend/python/transformers/requirements-hipblas.txt
@@ -7,4 +7,4 @@ numba==0.60.0
bitsandbytes
outetts
bitsandbytes
-sentence-transformers==3.4.0
+sentence-transformers==3.4.1
diff --git a/backend/python/transformers/requirements-intel.txt b/backend/python/transformers/requirements-intel.txt
index 5b677199..1418a3c3 100644
--- a/backend/python/transformers/requirements-intel.txt
+++ b/backend/python/transformers/requirements-intel.txt
@@ -8,4 +8,4 @@ numba==0.60.0
intel-extension-for-transformers
bitsandbytes
outetts
-sentence-transformers==3.4.0
+sentence-transformers==3.4.1
From 96cb407ee03388c034bae6e91c240b1cbf577ed3 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 4 Feb 2025 08:57:19 +0100
Subject: [PATCH 140/679] chore(deps): Bump docs/themes/hugo-theme-relearn from
`5bcb9fe` to `66bc366` (#4750)
chore(deps): Bump docs/themes/hugo-theme-relearn
Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `5bcb9fe` to `66bc366`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](https://github.com/McShelby/hugo-theme-relearn/compare/5bcb9fe5e61d2fbe702034a24425992fd2455b0a...66bc366c4727a958f3873f409550daa36932c03f)
---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
dependency-type: direct:production
...
Signed-off-by: dependabot[bot]
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
docs/themes/hugo-theme-relearn | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/themes/hugo-theme-relearn b/docs/themes/hugo-theme-relearn
index 5bcb9fe5..66bc366c 160000
--- a/docs/themes/hugo-theme-relearn
+++ b/docs/themes/hugo-theme-relearn
@@ -1 +1 @@
-Subproject commit 5bcb9fe5e61d2fbe702034a24425992fd2455b0a
+Subproject commit 66bc366c4727a958f3873f409550daa36932c03f
From 6a91288c8ccc344270ddf0a93e509b226dd51496 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 4 Feb 2025 09:45:52 +0100
Subject: [PATCH 141/679] chore(model gallery): add
fblgit_miniclaus-qw1.5b-unamgs-grpo (#4758)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 24b4d65f..76298cbb 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3491,6 +3491,21 @@
- filename: TinySwallow-1.5B-Instruct-Q4_K_M.gguf
sha256: 4d409c8873c1650a19c0a7a1c051e342613191a487768fe0d29735b9361079cd
uri: huggingface://bartowski/TinySwallow-1.5B-Instruct-GGUF/TinySwallow-1.5B-Instruct-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "fblgit_miniclaus-qw1.5b-unamgs-grpo"
+ icon: https://huggingface.co/fblgit/miniclaus-qw1.5B-UNAMGS/resolve/main/miniclaus_qw15-UNAMGS.png
+ urls:
+ - https://huggingface.co/fblgit/miniclaus-qw1.5B-UNAMGS-GRPO
+ - https://huggingface.co/bartowski/fblgit_miniclaus-qw1.5B-UNAMGS-GRPO-GGUF
+ description: |
+ This version is RL with GRPO on GSM8k for 1400 steps
+ overrides:
+ parameters:
+ model: fblgit_miniclaus-qw1.5B-UNAMGS-GRPO-Q4_K_M.gguf
+ files:
+ - filename: fblgit_miniclaus-qw1.5B-UNAMGS-GRPO-Q4_K_M.gguf
+ sha256: 88ceacc5900062bc2afc352f009233225b0fe10203cbb61b122e8f10244449c8
+ uri: huggingface://bartowski/fblgit_miniclaus-qw1.5B-UNAMGS-GRPO-GGUF/fblgit_miniclaus-qw1.5B-UNAMGS-GRPO-Q4_K_M.gguf
- &llama31
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1
icon: https://avatars.githubusercontent.com/u/153379578
From bfa3d4ccff3a0d057d3c6c89f883c035d7398745 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 4 Feb 2025 09:50:18 +0100
Subject: [PATCH 142/679] chore(model gallery): add
nohobby_l3.3-prikol-70b-v0.4 (#4759)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 76298cbb..50ea9b27 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -489,6 +489,25 @@
- filename: L3.3-Nevoria-R1-70b-Q4_K_M.gguf
sha256: 9f32f202fb5b1465c942693bb11eea9e8a1c5686b00602715b495c068eaf1c58
uri: huggingface://bartowski/L3.3-Nevoria-R1-70b-GGUF/L3.3-Nevoria-R1-70b-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "nohobby_l3.3-prikol-70b-v0.4"
+ icon: https://files.catbox.moe/x9t3zo.png
+ urls:
+ - https://huggingface.co/Nohobby/L3.3-Prikol-70B-v0.4
+ - https://huggingface.co/bartowski/Nohobby_L3.3-Prikol-70B-v0.4-GGUF
+ description: |
+ I have yet to try it UPD: it sucks, bleh
+
+ Sometimes mistakes {{user}} for {{char}} and can't think. Other than that, the behavior is similar to the predecessors.
+
+ It sometimes gives some funny replies tho, yay!
+ overrides:
+ parameters:
+ model: Nohobby_L3.3-Prikol-70B-v0.4-Q4_K_M.gguf
+ files:
+ - filename: Nohobby_L3.3-Prikol-70B-v0.4-Q4_K_M.gguf
+ sha256: e1d67a40bdf0526bdfcaa16c6e4dfeecad41651e201b4009b65f4f444b773604
+ uri: huggingface://bartowski/Nohobby_L3.3-Prikol-70B-v0.4-GGUF/Nohobby_L3.3-Prikol-70B-v0.4-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From 464686aee65e31924d282e71a037d857b8d0504e Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Tue, 4 Feb 2025 09:51:54 +0100
Subject: [PATCH 143/679] chore(model gallery): add suayptalha_maestro-10b
(#4760)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 50ea9b27..aaee1d74 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -212,6 +212,21 @@
- filename: Virtuoso-Lite-Q4_K_M.gguf
sha256: 1d21bef8467a11a1e473d397128b05fb87b7e824606cdaea061e550cb219fee2
uri: huggingface://bartowski/Virtuoso-Lite-GGUF/Virtuoso-Lite-Q4_K_M.gguf
+- !!merge <<: *falcon3
+ name: "suayptalha_maestro-10b"
+ icon: https://huggingface.co/suayptalha/Maestro-10B/resolve/main/Maestro-Logo.png
+ urls:
+ - https://huggingface.co/suayptalha/Maestro-10B
+ - https://huggingface.co/bartowski/suayptalha_Maestro-10B-GGUF
+ description: |
+ Maestro-10B is a 10 billion parameter model fine-tuned from Virtuoso-Lite, a next-generation language model developed by arcee-ai. Virtuoso-Lite itself is based on the Llama-3 architecture, distilled from Deepseek-v3 using approximately 1.1 billion tokens/logits. This distillation process allows Virtuoso-Lite to achieve robust performance with a smaller parameter count, excelling in reasoning, code generation, and mathematical problem-solving. Maestro-10B inherits these strengths from its base model, Virtuoso-Lite, and further enhances them through fine-tuning on the OpenOrca dataset. This combination of a distilled base model and targeted fine-tuning makes Maestro-10B a powerful and efficient language model.
+ overrides:
+ parameters:
+ model: suayptalha_Maestro-10B-Q4_K_M.gguf
+ files:
+ - filename: suayptalha_Maestro-10B-Q4_K_M.gguf
+ sha256: c570381da5624782ce6df4186ace6f747429fcbaf1a22c2a348288d3552eb19c
+ uri: huggingface://bartowski/suayptalha_Maestro-10B-GGUF/suayptalha_Maestro-10B-Q4_K_M.gguf
- &intellect1
name: "intellect-1-instruct"
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master"
From 7329db4e7896542a876c7a5b8500060d366af790 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Tue, 4 Feb 2025 22:48:49 +0100
Subject: [PATCH 144/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`3ec9fd4b77b6aca03a3c2bf678eae3f9517d6904` (#4762)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 576a480b..e541b503 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=5598f475be3e31430fbe17ebb85654ec90dc201e
+CPPLLAMA_VERSION?=3ec9fd4b77b6aca03a3c2bf678eae3f9517d6904
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 3324c4e6cbbf1cbb772c061e325480fbf03b9805 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 5 Feb 2025 10:09:33 +0100
Subject: [PATCH 145/679] chore(model gallery): add agi-0_art-skynet-3b (#4763)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index aaee1d74..bae29241 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1432,6 +1432,20 @@
- filename: LocalAI-functioncall-llama3.2-1b-v0.4.Q8_0.gguf
sha256: 547e57c2d3f17c632c9fd303afdb00446e7396df453aee62633b76976c407616
uri: huggingface://mradermacher/LocalAI-functioncall-llama3.2-1b-v0.4-GGUF/LocalAI-functioncall-llama3.2-1b-v0.4.Q8_0.gguf
+- !!merge <<: *llama32
+ name: "agi-0_art-skynet-3b"
+ urls:
+ - https://huggingface.co/AGI-0/Art-Skynet-3B
+ - https://huggingface.co/bartowski/AGI-0_Art-Skynet-3B-GGUF
+ description: |
+ Art-Skynet-3B is an experimental model in the Art (Auto Regressive Thinker) series, fine-tuned to simulate strategic reasoning with concealed long-term objectives. Built on meta-llama/Llama-3.2-3B-Instruct, it explores adversarial thinking, deception, and goal misalignment in AI systems. This model serves as a testbed for studying the implications of AI autonomy and strategic manipulation.
+ overrides:
+ parameters:
+ model: AGI-0_Art-Skynet-3B-Q4_K_M.gguf
+ files:
+ - filename: AGI-0_Art-Skynet-3B-Q4_K_M.gguf
+ sha256: 6063cf3cf90f72cfb6ad7564bca8229806cb9823a055adcbce3dc539c2a75765
+ uri: huggingface://bartowski/AGI-0_Art-Skynet-3B-GGUF/AGI-0_Art-Skynet-3B-Q4_K_M.gguf
- &qwen25
name: "qwen2.5-14b-instruct" ## Qwen2.5
icon: https://avatars.githubusercontent.com/u/141221163
From 0bc3dc43dad0a7c9a0e795e09dcd48c65a5efa8c Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 5 Feb 2025 10:13:21 +0100
Subject: [PATCH 146/679] chore(model gallery): add rubenroy_gilgamesh-72b
(#4764)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index bae29241..4a2a0c2e 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3554,6 +3554,27 @@
- filename: fblgit_miniclaus-qw1.5B-UNAMGS-GRPO-Q4_K_M.gguf
sha256: 88ceacc5900062bc2afc352f009233225b0fe10203cbb61b122e8f10244449c8
uri: huggingface://bartowski/fblgit_miniclaus-qw1.5B-UNAMGS-GRPO-GGUF/fblgit_miniclaus-qw1.5B-UNAMGS-GRPO-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "rubenroy_gilgamesh-72b"
+ icon: https://cdn.ruben-roy.com/AI/Gilgamesh/img/art.png
+ urls:
+ - https://huggingface.co/rubenroy/Gilgamesh-72B
+ - https://huggingface.co/bartowski/rubenroy_Gilgamesh-72B-GGUF
+ description: |
+ Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:
+
+ GammaCorpus-v2-5m: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
+ GammaCorpus-CoT-Math-170k: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics made to help the model improve step-by-step problem-solving.
+ GammaCorpus-Fact-QA-450k: A dataset containing factual question-answer pairs for enforcing some important current knowledge.
+
+ These datasets were all built and curated by me, however I thank my other team members at Ovantage Labs for assisting me in the creation and curation of these datasets.
+ overrides:
+ parameters:
+ model: rubenroy_Gilgamesh-72B-Q4_K_M.gguf
+ files:
+ - filename: rubenroy_Gilgamesh-72B-Q4_K_M.gguf
+ sha256: c6842b3bc882082c63243e762234ae697c1727bebed18b5241eb97e019f0cf68
+ uri: huggingface://bartowski/rubenroy_Gilgamesh-72B-GGUF/rubenroy_Gilgamesh-72B-Q4_K_M.gguf
- &llama31
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1
icon: https://avatars.githubusercontent.com/u/153379578
From 1996ceb293c558f996312acdcc5622820ba8633e Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 5 Feb 2025 10:17:05 +0100
Subject: [PATCH 147/679] chore(model gallery): add
krutrim-ai-labs_krutrim-2-instruct (#4765)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 4a2a0c2e..085881bb 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -6815,6 +6815,21 @@
- filename: Mistral-Small-24B-Instruct-2501-Q4_K_M.gguf
sha256: d1a6d049f09730c3f8ba26cf6b0b60c89790b5fdafa9a59c819acdfe93fffd1b
uri: huggingface://bartowski/Mistral-Small-24B-Instruct-2501-GGUF/Mistral-Small-24B-Instruct-2501-Q4_K_M.gguf
+- !!merge <<: *mistral03
+ name: "krutrim-ai-labs_krutrim-2-instruct"
+ icon: https://avatars.githubusercontent.com/u/168750421?s=200&v=4
+ urls:
+ - https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct
+ - https://huggingface.co/bartowski/krutrim-ai-labs_Krutrim-2-instruct-GGUF
+ description: |
+ Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is built on the Mistral-NeMo 12B architecture and trained across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned for instruction following on diverse data covering a wide range of tasks, including knowledge recall, math, reasoning, coding, safety, and creative writing.
+ overrides:
+ parameters:
+ model: krutrim-ai-labs_Krutrim-2-instruct-Q4_K_M.gguf
+ files:
+ - filename: krutrim-ai-labs_Krutrim-2-instruct-Q4_K_M.gguf
+ sha256: 03aa6d1fb7ab70482a2242839b8d8e1c789aa90a8be415076ddf84bef65f06c7
+ uri: huggingface://bartowski/krutrim-ai-labs_Krutrim-2-instruct-GGUF/krutrim-ai-labs_Krutrim-2-instruct-Q4_K_M.gguf
- &mudler
url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
name: "LocalAI-llama3-8b-function-call-v0.2"
From 7bc80c17f8f28bf2fb2986c5edf2c421aacd559d Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 5 Feb 2025 10:19:31 +0100
Subject: [PATCH 148/679] chore(model gallery): add
LocalAI-functioncall-llama3.2-3b-v0.5 (#4766)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 085881bb..d55adda9 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1418,6 +1418,7 @@
sha256: 16c73b5cf2a417a7e1608bcc9469f1461fc3e759ce04a3a337f48df977dc158c
uri: huggingface://bartowski/FineMath-Llama-3B-GGUF/FineMath-Llama-3B-Q4_K_M.gguf
- !!merge <<: *llama32
+ icon: https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/Dzbdzn27KEc3K6zNNi070.png
name: "LocalAI-functioncall-llama3.2-1b-v0.4"
url: "github:mudler/LocalAI/gallery/llama3.2-fcall.yaml@master"
urls:
@@ -1446,6 +1447,21 @@
- filename: AGI-0_Art-Skynet-3B-Q4_K_M.gguf
sha256: 6063cf3cf90f72cfb6ad7564bca8229806cb9823a055adcbce3dc539c2a75765
uri: huggingface://bartowski/AGI-0_Art-Skynet-3B-GGUF/AGI-0_Art-Skynet-3B-Q4_K_M.gguf
+- !!merge <<: *llama32
+ name: "localai-functioncall-llama3.2-3b-v0.5"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/Dzbdzn27KEc3K6zNNi070.png
+ urls:
+ - https://huggingface.co/mudler/LocalAI-functioncall-llama3.2-3b-v0.5
+ - https://huggingface.co/mudler/LocalAI-functioncall-llama3.2-3b-v0.5-Q4_K_M-GGUF
+ description: |
+ A model tailored to be conversational and execute function calls with LocalAI. This model is based on llama3.2 (3B).
+ overrides:
+ parameters:
+ model: localai-functioncall-llama3.2-3b-v0.5-q4_k_m.gguf
+ files:
+ - filename: localai-functioncall-llama3.2-3b-v0.5-q4_k_m.gguf
+ sha256: edc50f6c243e6bd6912599661a15e030de03d2be53409663ac27d3ca48306ee4
+ uri: huggingface://mudler/LocalAI-functioncall-llama3.2-3b-v0.5-Q4_K_M-GGUF/localai-functioncall-llama3.2-3b-v0.5-q4_k_m.gguf
- &qwen25
name: "qwen2.5-14b-instruct" ## Qwen2.5
icon: https://avatars.githubusercontent.com/u/141221163
From 7daf5ac3e3e89218a2e15bc92be9bc8d9b2bdecb Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 5 Feb 2025 18:37:09 +0100
Subject: [PATCH 149/679] fix(gallery): do not return overrides and additional
config (#4768)
When hitting /models/available we are intersted in the model
description, name and small metadatas. Configuration and overrides are
part of internals which are required only for installation.
This also solves a current bug when hitting /models/available fails if
one of the gallery items have overrides with parameters defined
Signed-off-by: Ettore Di Giacinto
---
core/gallery/models_test.go | 6 ++++--
core/gallery/request.go | 12 ++++++++----
core/gallery/request_test.go | 6 +++++-
core/http/app_test.go | 16 ++++++++++------
core/http/endpoints/localai/gallery.go | 18 ++++++++++++------
5 files changed, 39 insertions(+), 19 deletions(-)
diff --git a/core/gallery/models_test.go b/core/gallery/models_test.go
index 6229c983..ef4faed8 100644
--- a/core/gallery/models_test.go
+++ b/core/gallery/models_test.go
@@ -48,8 +48,10 @@ var _ = Describe("Model test", func() {
defer os.RemoveAll(tempdir)
gallery := []GalleryModel{{
- Name: "bert",
- URL: bertEmbeddingsURL,
+ Metadata: Metadata{
+ Name: "bert",
+ URL: bertEmbeddingsURL,
+ },
}}
out, err := yaml.Marshal(gallery)
Expect(err).ToNot(HaveOccurred())
diff --git a/core/gallery/request.go b/core/gallery/request.go
index eec764c1..72d078a1 100644
--- a/core/gallery/request.go
+++ b/core/gallery/request.go
@@ -11,6 +11,14 @@ import (
// It is used to install the model by resolving the URL and downloading the files.
// The other fields are used to override the configuration of the model.
type GalleryModel struct {
+ Metadata `json:",inline" yaml:",inline"`
+ // config_file is read in the situation where URL is blank - and therefore this is a base config.
+ ConfigFile map[string]interface{} `json:"config_file,omitempty" yaml:"config_file,omitempty"`
+ // Overrides are used to override the configuration of the model located at URL
+ Overrides map[string]interface{} `json:"overrides,omitempty" yaml:"overrides,omitempty"`
+}
+
+type Metadata struct {
URL string `json:"url,omitempty" yaml:"url,omitempty"`
Name string `json:"name,omitempty" yaml:"name,omitempty"`
Description string `json:"description,omitempty" yaml:"description,omitempty"`
@@ -18,10 +26,6 @@ type GalleryModel struct {
URLs []string `json:"urls,omitempty" yaml:"urls,omitempty"`
Icon string `json:"icon,omitempty" yaml:"icon,omitempty"`
Tags []string `json:"tags,omitempty" yaml:"tags,omitempty"`
- // config_file is read in the situation where URL is blank - and therefore this is a base config.
- ConfigFile map[string]interface{} `json:"config_file,omitempty" yaml:"config_file,omitempty"`
- // Overrides are used to override the configuration of the model located at URL
- Overrides map[string]interface{} `json:"overrides,omitempty" yaml:"overrides,omitempty"`
// AdditionalFiles are used to add additional files to the model
AdditionalFiles []File `json:"files,omitempty" yaml:"files,omitempty"`
// Gallery is a reference to the gallery which contains the model
diff --git a/core/gallery/request_test.go b/core/gallery/request_test.go
index 23281cc6..ed07f474 100644
--- a/core/gallery/request_test.go
+++ b/core/gallery/request_test.go
@@ -9,7 +9,11 @@ import (
var _ = Describe("Gallery API tests", func() {
Context("requests", func() {
It("parses github with a branch", func() {
- req := GalleryModel{URL: "github:go-skynet/model-gallery/gpt4all-j.yaml@main"}
+ req := GalleryModel{
+ Metadata: Metadata{
+ URL: "github:go-skynet/model-gallery/gpt4all-j.yaml@main",
+ },
+ }
e, err := GetGalleryConfigFromURL(req.URL, "")
Expect(err).ToNot(HaveOccurred())
Expect(e.Name).To(Equal("gpt4all-j"))
diff --git a/core/http/app_test.go b/core/http/app_test.go
index bc4ecfae..ca7a2eaa 100644
--- a/core/http/app_test.go
+++ b/core/http/app_test.go
@@ -299,14 +299,18 @@ var _ = Describe("API test", func() {
g := []gallery.GalleryModel{
{
- Name: "bert",
- URL: bertEmbeddingsURL,
+ Metadata: gallery.Metadata{
+ Name: "bert",
+ URL: bertEmbeddingsURL,
+ },
},
{
- Name: "bert2",
- URL: bertEmbeddingsURL,
- Overrides: map[string]interface{}{"foo": "bar"},
- AdditionalFiles: []gallery.File{{Filename: "foo.yaml", URI: bertEmbeddingsURL}},
+ Metadata: gallery.Metadata{
+ Name: "bert2",
+ URL: bertEmbeddingsURL,
+ AdditionalFiles: []gallery.File{{Filename: "foo.yaml", URI: bertEmbeddingsURL}},
+ },
+ Overrides: map[string]interface{}{"foo": "bar"},
},
}
out, err := yaml.Marshal(g)
diff --git a/core/http/endpoints/localai/gallery.go b/core/http/endpoints/localai/gallery.go
index 5b2968f4..9dc99f5d 100644
--- a/core/http/endpoints/localai/gallery.go
+++ b/core/http/endpoints/localai/gallery.go
@@ -117,19 +117,25 @@ func (mgs *ModelGalleryEndpointService) DeleteModelGalleryEndpoint() func(c *fib
// @Router /models/available [get]
func (mgs *ModelGalleryEndpointService) ListModelFromGalleryEndpoint() func(c *fiber.Ctx) error {
return func(c *fiber.Ctx) error {
- log.Debug().Msgf("Listing models from galleries: %+v", mgs.galleries)
models, err := gallery.AvailableGalleryModels(mgs.galleries, mgs.modelPath)
if err != nil {
return err
}
- log.Debug().Msgf("Models found from galleries: %+v", models)
- for _, m := range models {
- log.Debug().Msgf("Model found from galleries: %+v", m)
+
+ log.Debug().Msgf("Available %d models from %d galleries\n", len(models), len(mgs.galleries))
+
+ m := []gallery.Metadata{}
+
+ for _, mm := range models {
+ m = append(m, mm.Metadata)
}
- dat, err := json.Marshal(models)
+
+ log.Debug().Msgf("Models %#v", m)
+
+ dat, err := json.Marshal(m)
if err != nil {
- return err
+ return fmt.Errorf("could not marshal models: %w", err)
}
return c.Send(dat)
}
From 3ecaea1b6e114245cbfd5720f3936652eb63cc77 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 5 Feb 2025 19:41:49 +0100
Subject: [PATCH 150/679] chore(docs): update sponsors in the website
Signed-off-by: Ettore Di Giacinto
---
docs/content/docs/overview.md | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/docs/content/docs/overview.md b/docs/content/docs/overview.md
index 5bcb6178..2176f5c2 100644
--- a/docs/content/docs/overview.md
+++ b/docs/content/docs/overview.md
@@ -120,6 +120,23 @@ To help the project you can:
[](https://star-history.com/#go-skynet/LocalAI&Date)
+## ā¤ļø Sponsors
+
+> Do you find LocalAI useful?
+
+Support the project by becoming [a backer or sponsor](https://github.com/sponsors/mudler). Your logo will show up here with a link to your website.
+
+A huge thank you to our generous sponsors who support this project covering CI expenses, and our [Sponsor list](https://github.com/sponsors/mudler):
+
+
+
+
+
+
+
+
+
+
## š License
LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/).
From 2a702e9ca4b72969de9580b7fedd13d546c50b2c Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 5 Feb 2025 19:49:11 +0100
Subject: [PATCH 151/679] chore(docs): small updates
Signed-off-by: Ettore Di Giacinto
---
docs/content/docs/overview.md | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/docs/content/docs/overview.md b/docs/content/docs/overview.md
index 2176f5c2..d666db85 100644
--- a/docs/content/docs/overview.md
+++ b/docs/content/docs/overview.md
@@ -40,6 +40,10 @@ icon = "info"
+
+
+
+
@@ -118,7 +122,7 @@ To help the project you can:
## š Star history
-[](https://star-history.com/#go-skynet/LocalAI&Date)
+[](https://star-history.com/#mudler/LocalAI&Date)
## ā¤ļø Sponsors
From 28a1310890595d270e1ce2598e4c1c8e79fc0d29 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Wed, 5 Feb 2025 19:50:32 +0100
Subject: [PATCH 152/679] chore(docs): enhance visibility
Signed-off-by: Ettore Di Giacinto
---
docs/content/docs/overview.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/content/docs/overview.md b/docs/content/docs/overview.md
index d666db85..9e72f119 100644
--- a/docs/content/docs/overview.md
+++ b/docs/content/docs/overview.md
@@ -134,10 +134,10 @@ A huge thank you to our generous sponsors who support this project covering CI e
-
+
-
+
From 81be192279e016c2a35dbf130a67cc9e8ccdbc60 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Thu, 6 Feb 2025 00:49:15 +0100
Subject: [PATCH 153/679] chore: :arrow_up: Update leejet/stable-diffusion.cpp
to `d46ed5e184b97c2018dc2e8105925bdb8775e02c` (#4769)
:arrow_up: Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index e541b503..663a95de 100644
--- a/Makefile
+++ b/Makefile
@@ -24,7 +24,7 @@ BARKCPP_VERSION?=v1.0.0
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=5eb15ef4d022bef4a391de4f5f6556e81fbb5024
+STABLEDIFFUSION_GGML_VERSION?=d46ed5e184b97c2018dc2e8105925bdb8775e02c
ONNX_VERSION?=1.20.0
ONNX_ARCH?=x64
From d35595372d1b3f585175e638814c30bc6a20dd89 Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Thu, 6 Feb 2025 09:02:51 +0100
Subject: [PATCH 154/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`d774ab3acc4fee41fbed6dbfc192b57d5f79f34b` (#4770)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 663a95de..7edb6f6a 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ DETECT_LIBS?=true
# llama.cpp versions
GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=3ec9fd4b77b6aca03a3c2bf678eae3f9517d6904
+CPPLLAMA_VERSION?=d774ab3acc4fee41fbed6dbfc192b57d5f79f34b
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From 16ced071025888708a59ee40e740cedf24aff039 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 6 Feb 2025 11:59:14 +0100
Subject: [PATCH 155/679] chore(model gallery): add
arliai_llama-3.3-70b-arliai-rpmax-v1.4 (#4772)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index d55adda9..b57d337f 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -523,6 +523,20 @@
- filename: Nohobby_L3.3-Prikol-70B-v0.4-Q4_K_M.gguf
sha256: e1d67a40bdf0526bdfcaa16c6e4dfeecad41651e201b4009b65f4f444b773604
uri: huggingface://bartowski/Nohobby_L3.3-Prikol-70B-v0.4-GGUF/Nohobby_L3.3-Prikol-70B-v0.4-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "arliai_llama-3.3-70b-arliai-rpmax-v1.4"
+ urls:
+ - https://huggingface.co/ArliAI/Llama-3.3-70B-ArliAI-RPMax-v1.4
+ - https://huggingface.co/bartowski/ArliAI_Llama-3.3-70B-ArliAI-RPMax-v1.4-GGUF
+ description: |
+ RPMax is a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.
+ overrides:
+ parameters:
+ model: ArliAI_Llama-3.3-70B-ArliAI-RPMax-v1.4-Q4_K_M.gguf
+ files:
+ - filename: ArliAI_Llama-3.3-70B-ArliAI-RPMax-v1.4-Q4_K_M.gguf
+ sha256: 7c79e76e5c057cfe32529d930360fbebd29697948e5bac4e4b2eb6d2ee596e31
+ uri: huggingface://bartowski/ArliAI_Llama-3.3-70B-ArliAI-RPMax-v1.4-GGUF/ArliAI_Llama-3.3-70B-ArliAI-RPMax-v1.4-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
@@ -1448,7 +1462,7 @@
sha256: 6063cf3cf90f72cfb6ad7564bca8229806cb9823a055adcbce3dc539c2a75765
uri: huggingface://bartowski/AGI-0_Art-Skynet-3B-GGUF/AGI-0_Art-Skynet-3B-Q4_K_M.gguf
- !!merge <<: *llama32
- name: "localai-functioncall-llama3.2-3b-v0.5"
+ name: "LocalAI-functioncall-llama3.2-3b-v0.5"
icon: https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/Dzbdzn27KEc3K6zNNi070.png
urls:
- https://huggingface.co/mudler/LocalAI-functioncall-llama3.2-3b-v0.5
From a801561f819bc79bc6e6c232b55c42586a406e42 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 6 Feb 2025 12:01:56 +0100
Subject: [PATCH 156/679] chore(model gallery): add
tiger-lab_qwen2.5-32b-instruct-cft (#4773)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index b57d337f..98760238 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3605,6 +3605,20 @@
- filename: rubenroy_Gilgamesh-72B-Q4_K_M.gguf
sha256: c6842b3bc882082c63243e762234ae697c1727bebed18b5241eb97e019f0cf68
uri: huggingface://bartowski/rubenroy_Gilgamesh-72B-GGUF/rubenroy_Gilgamesh-72B-Q4_K_M.gguf
+- !!merge <<: *qwen25
+ name: "tiger-lab_qwen2.5-32b-instruct-cft"
+ urls:
+ - https://huggingface.co/TIGER-Lab/Qwen2.5-32B-Instruct-CFT
+ - https://huggingface.co/bartowski/TIGER-Lab_Qwen2.5-32B-Instruct-CFT-GGUF
+ description: |
+ Qwen2.5-32B-Instruct-CFT is a 32B parameter model fine-tuned using our novel Critique Fine-Tuning (CFT) approach. Built upon the Qwen2.5-32B-Instruct base model, this variant is trained to critique and analyze responses rather than simply imitate them, leading to enhanced reasoning capabilities.
+ overrides:
+ parameters:
+ model: TIGER-Lab_Qwen2.5-32B-Instruct-CFT-Q4_K_M.gguf
+ files:
+ - filename: TIGER-Lab_Qwen2.5-32B-Instruct-CFT-Q4_K_M.gguf
+ sha256: 57e87e246db368f39f31f38e44ba8e9dc838a026f729f5a123aacc2aeb5a9402
+ uri: huggingface://bartowski/TIGER-Lab_Qwen2.5-32B-Instruct-CFT-GGUF/TIGER-Lab_Qwen2.5-32B-Instruct-CFT-Q4_K_M.gguf
- &llama31
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1
icon: https://avatars.githubusercontent.com/u/153379578
From e4b8ddb6a1c3f0d14dbdde217b24896951e03da3 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 6 Feb 2025 12:03:59 +0100
Subject: [PATCH 157/679] chore(model gallery): add
black-ink-guild_pernicious_prophecy_70b (#4774)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 98760238..4e75e71f 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -537,6 +537,22 @@
- filename: ArliAI_Llama-3.3-70B-ArliAI-RPMax-v1.4-Q4_K_M.gguf
sha256: 7c79e76e5c057cfe32529d930360fbebd29697948e5bac4e4b2eb6d2ee596e31
uri: huggingface://bartowski/ArliAI_Llama-3.3-70B-ArliAI-RPMax-v1.4-GGUF/ArliAI_Llama-3.3-70B-ArliAI-RPMax-v1.4-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "black-ink-guild_pernicious_prophecy_70b"
+ icon: https://huggingface.co/Black-Ink-Guild/Pernicious_Prophecy_70B/resolve/main/header.gif
+ urls:
+ - https://huggingface.co/Black-Ink-Guild/Pernicious_Prophecy_70B
+ - https://huggingface.co/bartowski/Black-Ink-Guild_Pernicious_Prophecy_70B-GGUF
+ description: |
+ Pernicious Prophecy 70B is a Llama-3.3 70B-based, two-step model designed by Black Ink Guild (SicariusSicariiStuff and invisietch) for uncensored roleplay, assistant tasks, and general usage.
+ NOTE: Pernicious Prophecy 70B is an uncensored model and can produce deranged, offensive, and dangerous outputs. You are solely responsible for anything that you choose to do with this model.
+ overrides:
+ parameters:
+ model: Black-Ink-Guild_Pernicious_Prophecy_70B-Q4_K_M.gguf
+ files:
+ - filename: Black-Ink-Guild_Pernicious_Prophecy_70B-Q4_K_M.gguf
+ sha256: d8d4874b837993546b750db3faf1c6e5d867883a6750f04f1f4986973d7c107b
+ uri: huggingface://bartowski/Black-Ink-Guild_Pernicious_Prophecy_70B-GGUF/Black-Ink-Guild_Pernicious_Prophecy_70B-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From 8d45670e4109db8968ffa5ae426f6656e9e0784c Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 6 Feb 2025 12:41:08 +0100
Subject: [PATCH 158/679] fix(openai): consistently return stop reason (#4771)
We were not returning a stop reason when no tool was actually called
(even if specified).
Fixes: https://github.com/mudler/LocalAI/issues/4716
Signed-off-by: Ettore Di Giacinto
---
core/http/endpoints/openai/chat.go | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/core/http/endpoints/openai/chat.go b/core/http/endpoints/openai/chat.go
index 3b8d3056..a94a729a 100644
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -401,6 +401,11 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
log.Debug().Msgf("Text content to return: %s", textContentToReturn)
noActionsToRun := len(results) > 0 && results[0].Name == noActionName || len(results) == 0
+ finishReason := "stop"
+ if len(input.Tools) > 0 {
+ finishReason = "tool_calls"
+ }
+
switch {
case noActionsToRun:
result, err := handleQuestion(config, input, ml, startupOptions, results, s, predInput)
@@ -408,19 +413,18 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
log.Error().Err(err).Msg("error handling question")
return
}
+
*c = append(*c, schema.Choice{
- Message: &schema.Message{Role: "assistant", Content: &result}})
+ FinishReason: finishReason,
+ Message: &schema.Message{Role: "assistant", Content: &result}})
default:
toolChoice := schema.Choice{
+ FinishReason: finishReason,
Message: &schema.Message{
Role: "assistant",
},
}
- if len(input.Tools) > 0 {
- toolChoice.FinishReason = "tool_calls"
- }
-
for _, ss := range results {
name, args := ss.Name, ss.Arguments
if len(input.Tools) > 0 {
@@ -438,7 +442,7 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, evaluat
},
)
} else {
- // otherwise we return more choices directly
+ // otherwise we return more choices directly (deprecated)
*c = append(*c, schema.Choice{
FinishReason: "function_call",
Message: &schema.Message{
From 7f90ff7aecd973a17c77a7248b9112401eac4c97 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 6 Feb 2025 18:36:23 +0100
Subject: [PATCH 159/679] chore(llama-ggml): drop deprecated backend (#4775)
The GGML format is now dead, since in the next version of LocalAI we
already bring many breaking compatibility changes, taking the occasion
also to drop ggml support (pre-gguf).
Signed-off-by: Ettore Di Giacinto
---
Makefile | 38 +---
backend/go/llm/llama-ggml/llama.go | 204 ------------------
backend/go/llm/llama-ggml/main.go | 19 --
core/http/app_test.go | 71 ------
docs/content/docs/features/text-generation.md | 17 +-
pkg/model/initializers.go | 6 +-
6 files changed, 7 insertions(+), 348 deletions(-)
delete mode 100644 backend/go/llm/llama-ggml/llama.go
delete mode 100644 backend/go/llm/llama-ggml/main.go
diff --git a/Makefile b/Makefile
index 7edb6f6a..790c6e6d 100644
--- a/Makefile
+++ b/Makefile
@@ -6,8 +6,6 @@ BINARY_NAME=local-ai
DETECT_LIBS?=true
# llama.cpp versions
-GOLLAMA_REPO?=https://github.com/go-skynet/go-llama.cpp
-GOLLAMA_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
CPPLLAMA_VERSION?=d774ab3acc4fee41fbed6dbfc192b57d5f79f34b
# whisper.cpp version
@@ -151,7 +149,6 @@ ifeq ($(BUILD_TYPE),hipblas)
LD_LIBRARY_PATH ?= /opt/rocm/lib:/opt/rocm/llvm/lib
export CXX=$(ROCM_HOME)/llvm/bin/clang++
export CC=$(ROCM_HOME)/llvm/bin/clang
- # llama-ggml has no hipblas support, so override it here.
export STABLE_BUILD_TYPE=
export GGML_HIP=1
GPU_TARGETS ?= gfx900,gfx906,gfx908,gfx940,gfx941,gfx942,gfx90a,gfx1030,gfx1031,gfx1100,gfx1101
@@ -188,7 +185,6 @@ ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx2
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-avx512
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-fallback
-ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-ggml
ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-grpc
ALL_GRPC_BACKENDS+=backend-assets/util/llama-cpp-rpc-server
ALL_GRPC_BACKENDS+=backend-assets/grpc/whisper
@@ -222,19 +218,6 @@ endif
all: help
-## go-llama.cpp
-sources/go-llama.cpp:
- mkdir -p sources/go-llama.cpp
- cd sources/go-llama.cpp && \
- git init && \
- git remote add origin $(GOLLAMA_REPO) && \
- git fetch origin && \
- git checkout $(GOLLAMA_VERSION) && \
- git submodule update --init --recursive --depth 1 --single-branch
-
-sources/go-llama.cpp/libbinding.a: sources/go-llama.cpp
- $(MAKE) -C sources/go-llama.cpp BUILD_TYPE=$(STABLE_BUILD_TYPE) libbinding.a
-
## bark.cpp
sources/bark.cpp:
git clone --recursive $(BARKCPP_REPO) sources/bark.cpp && \
@@ -310,19 +293,17 @@ sources/whisper.cpp:
sources/whisper.cpp/libwhisper.a: sources/whisper.cpp
cd sources/whisper.cpp && $(MAKE) libwhisper.a libggml.a
-get-sources: sources/go-llama.cpp sources/go-piper sources/stablediffusion-ggml.cpp sources/bark.cpp sources/whisper.cpp backend/cpp/llama/llama.cpp
+get-sources: sources/go-piper sources/stablediffusion-ggml.cpp sources/bark.cpp sources/whisper.cpp backend/cpp/llama/llama.cpp
replace:
$(GOCMD) mod edit -replace github.com/ggerganov/whisper.cpp=$(CURDIR)/sources/whisper.cpp
$(GOCMD) mod edit -replace github.com/ggerganov/whisper.cpp/bindings/go=$(CURDIR)/sources/whisper.cpp/bindings/go
$(GOCMD) mod edit -replace github.com/mudler/go-piper=$(CURDIR)/sources/go-piper
- $(GOCMD) mod edit -replace github.com/go-skynet/go-llama.cpp=$(CURDIR)/sources/go-llama.cpp
dropreplace:
$(GOCMD) mod edit -dropreplace github.com/ggerganov/whisper.cpp
$(GOCMD) mod edit -dropreplace github.com/ggerganov/whisper.cpp/bindings/go
$(GOCMD) mod edit -dropreplace github.com/mudler/go-piper
- $(GOCMD) mod edit -dropreplace github.com/go-skynet/go-llama.cpp
prepare-sources: get-sources replace
$(GOCMD) mod download
@@ -330,7 +311,6 @@ prepare-sources: get-sources replace
## GENERIC
rebuild: ## Rebuilds the project
$(GOCMD) clean -cache
- $(MAKE) -C sources/go-llama.cpp clean
$(MAKE) -C sources/whisper.cpp clean
$(MAKE) -C sources/go-piper clean
$(MAKE) build
@@ -434,7 +414,7 @@ run: prepare ## run local-ai
test-models/testmodel.ggml:
mkdir test-models
mkdir test-dir
- wget -q https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin -O test-models/testmodel.ggml
+ wget -q https://huggingface.co/RichardErkhov/Qwen_-_Qwen2-1.5B-Instruct-gguf/resolve/main/Qwen2-1.5B-Instruct.Q2_K.gguf -O test-models/testmodel.ggml
wget -q https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin -O test-models/whisper-en
wget -q https://huggingface.co/mudler/all-MiniLM-L6-v2/resolve/main/ggml-model-q4_0.bin -O test-models/bert
wget -q https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav -O test-dir/audio.wav
@@ -449,8 +429,7 @@ test: prepare test-models/testmodel.ggml grpcs
export GO_TAGS="tts debug"
$(MAKE) prepare-test
HUGGINGFACE_GRPC=$(abspath ./)/backend/python/transformers/run.sh TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
- $(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!llama && !llama-gguf" --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
- $(MAKE) test-llama
+ $(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!llama-gguf" --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
$(MAKE) test-llama-gguf
$(MAKE) test-tts
$(MAKE) test-stablediffusion
@@ -479,10 +458,6 @@ teardown-e2e:
rm -rf $(TEST_DIR) || true
docker stop $$(docker ps -q --filter ancestor=localai-tests)
-test-llama: prepare-test
- TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
- $(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="llama" --flake-attempts $(TEST_FLAKES) -v -r $(TEST_PATHS)
-
test-llama-gguf: prepare-test
TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="llama-gguf" --flake-attempts $(TEST_FLAKES) -v -r $(TEST_PATHS)
@@ -760,13 +735,6 @@ backend-assets/util/llama-cpp-rpc-server: backend-assets/grpc/llama-cpp-grpc
mkdir -p backend-assets/util/
cp -rf backend/cpp/llama-grpc/llama.cpp/build/bin/rpc-server backend-assets/util/llama-cpp-rpc-server
-backend-assets/grpc/llama-ggml: sources/go-llama.cpp sources/go-llama.cpp/libbinding.a backend-assets/grpc
- CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/sources/go-llama.cpp LIBRARY_PATH=$(CURDIR)/sources/go-llama.cpp \
- $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/llama-ggml ./backend/go/llm/llama-ggml/
-ifneq ($(UPX),)
- $(UPX) backend-assets/grpc/llama-ggml
-endif
-
backend-assets/grpc/bark-cpp: backend/go/bark/libbark.a backend-assets/grpc
CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/backend/go/bark/ LIBRARY_PATH=$(CURDIR)/backend/go/bark/ \
$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/bark-cpp ./backend/go/bark/
diff --git a/backend/go/llm/llama-ggml/llama.go b/backend/go/llm/llama-ggml/llama.go
deleted file mode 100644
index 1a7add69..00000000
--- a/backend/go/llm/llama-ggml/llama.go
+++ /dev/null
@@ -1,204 +0,0 @@
-package main
-
-// This is a wrapper to statisfy the GRPC service interface
-// It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
-import (
- "fmt"
-
- "github.com/go-skynet/go-llama.cpp"
- "github.com/mudler/LocalAI/pkg/grpc/base"
- pb "github.com/mudler/LocalAI/pkg/grpc/proto"
-)
-
-type LLM struct {
- base.SingleThread
-
- llama *llama.LLama
-}
-
-func (llm *LLM) Load(opts *pb.ModelOptions) error {
- ropeFreqBase := float32(10000)
- ropeFreqScale := float32(1)
-
- if opts.RopeFreqBase != 0 {
- ropeFreqBase = opts.RopeFreqBase
- }
- if opts.RopeFreqScale != 0 {
- ropeFreqScale = opts.RopeFreqScale
- }
-
- llamaOpts := []llama.ModelOption{
- llama.WithRopeFreqBase(ropeFreqBase),
- llama.WithRopeFreqScale(ropeFreqScale),
- }
-
- if opts.NGQA != 0 {
- llamaOpts = append(llamaOpts, llama.WithGQA(int(opts.NGQA)))
- }
-
- if opts.RMSNormEps != 0 {
- llamaOpts = append(llamaOpts, llama.WithRMSNormEPS(opts.RMSNormEps))
- }
-
- if opts.ContextSize != 0 {
- llamaOpts = append(llamaOpts, llama.SetContext(int(opts.ContextSize)))
- }
- if opts.F16Memory {
- llamaOpts = append(llamaOpts, llama.EnableF16Memory)
- }
- if opts.Embeddings {
- llamaOpts = append(llamaOpts, llama.EnableEmbeddings)
- }
- if opts.NGPULayers != 0 {
- llamaOpts = append(llamaOpts, llama.SetGPULayers(int(opts.NGPULayers)))
- }
-
- llamaOpts = append(llamaOpts, llama.SetMMap(opts.MMap))
- llamaOpts = append(llamaOpts, llama.SetMainGPU(opts.MainGPU))
- llamaOpts = append(llamaOpts, llama.SetTensorSplit(opts.TensorSplit))
- if opts.NBatch != 0 {
- llamaOpts = append(llamaOpts, llama.SetNBatch(int(opts.NBatch)))
- } else {
- llamaOpts = append(llamaOpts, llama.SetNBatch(512))
- }
-
- if opts.NUMA {
- llamaOpts = append(llamaOpts, llama.EnableNUMA)
- }
-
- if opts.LowVRAM {
- llamaOpts = append(llamaOpts, llama.EnabelLowVRAM)
- }
-
- model, err := llama.New(opts.ModelFile, llamaOpts...)
- llm.llama = model
-
- return err
-}
-
-func buildPredictOptions(opts *pb.PredictOptions) []llama.PredictOption {
- ropeFreqBase := float32(10000)
- ropeFreqScale := float32(1)
-
- if opts.RopeFreqBase != 0 {
- ropeFreqBase = opts.RopeFreqBase
- }
- if opts.RopeFreqScale != 0 {
- ropeFreqScale = opts.RopeFreqScale
- }
- predictOptions := []llama.PredictOption{
- llama.SetTemperature(opts.Temperature),
- llama.SetTopP(opts.TopP),
- llama.SetTopK(int(opts.TopK)),
- llama.SetTokens(int(opts.Tokens)),
- llama.SetThreads(int(opts.Threads)),
- llama.WithGrammar(opts.Grammar),
- llama.SetRopeFreqBase(ropeFreqBase),
- llama.SetRopeFreqScale(ropeFreqScale),
- llama.SetNegativePromptScale(opts.NegativePromptScale),
- llama.SetNegativePrompt(opts.NegativePrompt),
- }
-
- if opts.PromptCacheAll {
- predictOptions = append(predictOptions, llama.EnablePromptCacheAll)
- }
-
- if opts.PromptCacheRO {
- predictOptions = append(predictOptions, llama.EnablePromptCacheRO)
- }
-
- // Expected absolute path
- if opts.PromptCachePath != "" {
- predictOptions = append(predictOptions, llama.SetPathPromptCache(opts.PromptCachePath))
- }
-
- if opts.Mirostat != 0 {
- predictOptions = append(predictOptions, llama.SetMirostat(int(opts.Mirostat)))
- }
-
- if opts.MirostatETA != 0 {
- predictOptions = append(predictOptions, llama.SetMirostatETA(opts.MirostatETA))
- }
-
- if opts.MirostatTAU != 0 {
- predictOptions = append(predictOptions, llama.SetMirostatTAU(opts.MirostatTAU))
- }
-
- if opts.Debug {
- predictOptions = append(predictOptions, llama.Debug)
- }
-
- predictOptions = append(predictOptions, llama.SetStopWords(opts.StopPrompts...))
-
- if opts.PresencePenalty != 0 {
- predictOptions = append(predictOptions, llama.SetPenalty(opts.PresencePenalty))
- }
-
- if opts.NKeep != 0 {
- predictOptions = append(predictOptions, llama.SetNKeep(int(opts.NKeep)))
- }
-
- if opts.Batch != 0 {
- predictOptions = append(predictOptions, llama.SetBatch(int(opts.Batch)))
- }
-
- if opts.F16KV {
- predictOptions = append(predictOptions, llama.EnableF16KV)
- }
-
- if opts.IgnoreEOS {
- predictOptions = append(predictOptions, llama.IgnoreEOS)
- }
-
- if opts.Seed != 0 {
- predictOptions = append(predictOptions, llama.SetSeed(int(opts.Seed)))
- }
-
- //predictOptions = append(predictOptions, llama.SetLogitBias(c.Seed))
-
- predictOptions = append(predictOptions, llama.SetFrequencyPenalty(opts.FrequencyPenalty))
- predictOptions = append(predictOptions, llama.SetMlock(opts.MLock))
- predictOptions = append(predictOptions, llama.SetMemoryMap(opts.MMap))
- predictOptions = append(predictOptions, llama.SetPredictionMainGPU(opts.MainGPU))
- predictOptions = append(predictOptions, llama.SetPredictionTensorSplit(opts.TensorSplit))
- predictOptions = append(predictOptions, llama.SetTailFreeSamplingZ(opts.TailFreeSamplingZ))
- predictOptions = append(predictOptions, llama.SetTypicalP(opts.TypicalP))
- return predictOptions
-}
-
-func (llm *LLM) Predict(opts *pb.PredictOptions) (string, error) {
- return llm.llama.Predict(opts.Prompt, buildPredictOptions(opts)...)
-}
-
-func (llm *LLM) PredictStream(opts *pb.PredictOptions, results chan string) error {
- predictOptions := buildPredictOptions(opts)
-
- predictOptions = append(predictOptions, llama.SetTokenCallback(func(token string) bool {
- results <- token
- return true
- }))
-
- go func() {
- _, err := llm.llama.Predict(opts.Prompt, predictOptions...)
- if err != nil {
- fmt.Println("err: ", err)
- }
- close(results)
- }()
-
- return nil
-}
-
-func (llm *LLM) Embeddings(opts *pb.PredictOptions) ([]float32, error) {
- predictOptions := buildPredictOptions(opts)
-
- if len(opts.EmbeddingTokens) > 0 {
- tokens := []int{}
- for _, t := range opts.EmbeddingTokens {
- tokens = append(tokens, int(t))
- }
- return llm.llama.TokenEmbeddings(tokens, predictOptions...)
- }
-
- return llm.llama.Embeddings(opts.Embeddings, predictOptions...)
-}
diff --git a/backend/go/llm/llama-ggml/main.go b/backend/go/llm/llama-ggml/main.go
deleted file mode 100644
index 544771db..00000000
--- a/backend/go/llm/llama-ggml/main.go
+++ /dev/null
@@ -1,19 +0,0 @@
-package main
-
-import (
- "flag"
-
- grpc "github.com/mudler/LocalAI/pkg/grpc"
-)
-
-var (
- addr = flag.String("addr", "localhost:50051", "the address to connect to")
-)
-
-func main() {
- flag.Parse()
-
- if err := grpc.StartServer(*addr, &LLM{}); err != nil {
- panic(err)
- }
-}
diff --git a/core/http/app_test.go b/core/http/app_test.go
index ca7a2eaa..ecaf6da3 100644
--- a/core/http/app_test.go
+++ b/core/http/app_test.go
@@ -526,77 +526,6 @@ var _ = Describe("API test", func() {
Expect(content["usage"]).To(ContainSubstring("You can test this model with curl like this"))
})
- It("runs openllama(llama-ggml backend)", Label("llama"), func() {
- if runtime.GOOS != "linux" {
- Skip("test supported only on linux")
- }
- response := postModelApplyRequest("http://127.0.0.1:9090/models/apply", modelApplyRequest{
- URL: "github:go-skynet/model-gallery/openllama_3b.yaml",
- Name: "openllama_3b",
- Overrides: map[string]interface{}{"backend": "llama-ggml", "mmap": true, "f16": true, "context_size": 128},
- })
-
- Expect(response["uuid"]).ToNot(BeEmpty(), fmt.Sprint(response))
-
- uuid := response["uuid"].(string)
-
- Eventually(func() bool {
- response := getModelStatus("http://127.0.0.1:9090/models/jobs/" + uuid)
- return response["processed"].(bool)
- }, "360s", "10s").Should(Equal(true))
-
- By("testing completion")
- resp, err := client.CreateCompletion(context.TODO(), openai.CompletionRequest{Model: "openllama_3b", Prompt: "Count up to five: one, two, three, four, "})
- Expect(err).ToNot(HaveOccurred())
- Expect(len(resp.Choices)).To(Equal(1))
- Expect(resp.Choices[0].Text).To(ContainSubstring("five"))
-
- By("testing functions")
- resp2, err := client.CreateChatCompletion(
- context.TODO(),
- openai.ChatCompletionRequest{
- Model: "openllama_3b",
- Messages: []openai.ChatCompletionMessage{
- {
- Role: "user",
- Content: "What is the weather like in San Francisco (celsius)?",
- },
- },
- Functions: []openai.FunctionDefinition{
- openai.FunctionDefinition{
- Name: "get_current_weather",
- Description: "Get the current weather",
- Parameters: jsonschema.Definition{
- Type: jsonschema.Object,
- Properties: map[string]jsonschema.Definition{
- "location": {
- Type: jsonschema.String,
- Description: "The city and state, e.g. San Francisco, CA",
- },
- "unit": {
- Type: jsonschema.String,
- Enum: []string{"celcius", "fahrenheit"},
- },
- },
- Required: []string{"location"},
- },
- },
- },
- })
- Expect(err).ToNot(HaveOccurred())
- Expect(len(resp2.Choices)).To(Equal(1))
- Expect(resp2.Choices[0].Message.FunctionCall).ToNot(BeNil())
- Expect(resp2.Choices[0].Message.FunctionCall.Name).To(Equal("get_current_weather"), resp2.Choices[0].Message.FunctionCall.Name)
-
- var res map[string]string
- err = json.Unmarshal([]byte(resp2.Choices[0].Message.FunctionCall.Arguments), &res)
- Expect(err).ToNot(HaveOccurred())
- Expect(res["location"]).To(ContainSubstring("San Francisco"), fmt.Sprint(res))
- Expect(res["unit"]).To(Equal("celcius"), fmt.Sprint(res))
- Expect(string(resp2.Choices[0].FinishReason)).To(Equal("function_call"), fmt.Sprint(resp2.Choices[0].FinishReason))
-
- })
-
It("runs openllama gguf(llama-cpp)", Label("llama-gguf"), func() {
if runtime.GOOS != "linux" {
Skip("test supported only on linux")
diff --git a/docs/content/docs/features/text-generation.md b/docs/content/docs/features/text-generation.md
index 11ab3999..342b8e76 100644
--- a/docs/content/docs/features/text-generation.md
+++ b/docs/content/docs/features/text-generation.md
@@ -124,7 +124,7 @@ Note: rwkv models needs to specify the backend `rwkv` in the YAML config files a
{{% alert note %}}
-The `ggml` file format has been deprecated. If you are using `ggml` models and you are configuring your model with a YAML file, specify, use the `llama-ggml` backend instead. If you are relying in automatic detection of the model, you should be fine. For `gguf` models, use the `llama` backend. The go backend is deprecated as well but still available as `go-llama`. The go backend supports still features not available in the mainline: speculative sampling and embeddings.
+The `ggml` file format has been deprecated. If you are using `ggml` models and you are configuring your model with a YAML file, specify, use a LocalAI version older than v2.25.0. For `gguf` models, use the `llama` backend. The go backend is deprecated as well but still available as `go-llama`.
{{% /alert %}}
@@ -175,25 +175,12 @@ name: llama
backend: llama
parameters:
# Relative to the models path
- model: file.gguf.bin
-```
-
-In the example above we specify `llama` as the backend to restrict loading `gguf` models only.
-
-For instance, to use the `llama-ggml` backend for `ggml` models:
-
-```yaml
-name: llama
-backend: llama-ggml
-parameters:
- # Relative to the models path
- model: file.ggml.bin
+ model: file.gguf
```
#### Reference
- [llama](https://github.com/ggerganov/llama.cpp)
-- [binding](https://github.com/go-skynet/go-llama.cpp)
### exllama/2
diff --git a/pkg/model/initializers.go b/pkg/model/initializers.go
index ace72fa3..5e465cf0 100644
--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@@ -43,8 +43,6 @@ var TypeAlias map[string]string = map[string]string{
var AutoDetect = os.Getenv("DISABLE_AUTODETECT") != "true"
const (
- LlamaGGML = "llama-ggml"
-
LLamaCPP = "llama-cpp"
LLamaCPPAVX2 = "llama-cpp-avx2"
@@ -143,10 +141,10 @@ func orderBackends(backends map[string][]string) ([]string, error) {
// sets a priority list - first has more priority
priorityList := []string{
- // First llama.cpp(variants) and llama-ggml to follow.
+ // First llama.cpp(variants)
// We keep the fallback to prevent that if the llama.cpp variants
// that depends on shared libs if breaks have still a safety net.
- LLamaCPP, LlamaGGML, LLamaCPPFallback,
+ LLamaCPP, LLamaCPPFallback,
}
toTheEnd := []string{
From cc1f6f913f3c271cc2e73080991163b18ea03be0 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Thu, 6 Feb 2025 19:39:59 +0100
Subject: [PATCH 160/679] fix(llama.cpp): disable mirostat as default (#2911)
Even if increasing the quality of the output, it has shown to have
performance drawbacks to be so noticeable that the confuses users about
speed of LocalAI ( see also
https://github.com/mudler/LocalAI/issues/2780 ).
This changeset disables Mirostat by default (which can
be still enabled manually).
Signed-off-by: Ettore Di Giacinto
Co-authored-by: Dave
---
core/config/backend_config.go | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/core/config/backend_config.go b/core/config/backend_config.go
index 8ce93d9f..2b130ec8 100644
--- a/core/config/backend_config.go
+++ b/core/config/backend_config.go
@@ -287,7 +287,8 @@ func (cfg *BackendConfig) SetDefaults(opts ...ConfigLoaderOption) {
defaultTopP := 0.95
defaultTopK := 40
defaultTemp := 0.9
- defaultMirostat := 2
+ // https://github.com/mudler/LocalAI/issues/2780
+ defaultMirostat := 0
defaultMirostatTAU := 5.0
defaultMirostatETA := 0.1
defaultTypicalP := 1.0
From 731674eee7457642a042a043398d40e6cbf3e06a Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Thu, 6 Feb 2025 23:02:00 +0100
Subject: [PATCH 161/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`8a59053f63fffc24e730cd3ea067760abfe4a919` (#4776)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 790c6e6d..a1224035 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ BINARY_NAME=local-ai
DETECT_LIBS?=true
# llama.cpp versions
-CPPLLAMA_VERSION?=d774ab3acc4fee41fbed6dbfc192b57d5f79f34b
+CPPLLAMA_VERSION?=8a59053f63fffc24e730cd3ea067760abfe4a919
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From f670e0a91c788bde1c84d96958b3843d13f8f0f3 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 7 Feb 2025 13:29:53 +0100
Subject: [PATCH 162/679] chore(model gallery): add
nohobby_l3.3-prikol-70b-v0.5 (#4777)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 4e75e71f..5bde3e85 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -553,6 +553,29 @@
- filename: Black-Ink-Guild_Pernicious_Prophecy_70B-Q4_K_M.gguf
sha256: d8d4874b837993546b750db3faf1c6e5d867883a6750f04f1f4986973d7c107b
uri: huggingface://bartowski/Black-Ink-Guild_Pernicious_Prophecy_70B-GGUF/Black-Ink-Guild_Pernicious_Prophecy_70B-Q4_K_M.gguf
+- !!merge <<: *llama33
+ name: "nohobby_l3.3-prikol-70b-v0.5"
+ icon: https://files.catbox.moe/x9t3zo.png
+ urls:
+ - https://huggingface.co/Nohobby/L3.3-Prikol-70B-v0.5
+ - https://huggingface.co/bartowski/Nohobby_L3.3-Prikol-70B-v0.5-GGUF
+ description: |
+ 99% of mergekit addicts quit before they hit it big.
+
+ Gosh, I need to create an org for my test runs - my profile looks like a dumpster.
+
+ What was it again? Ah, the new model.
+
+ Exactly what I wanted. All I had to do was yank out the cursed official DeepSeek distill and here we are.
+
+ From the brief tests it gave me some unusual takes on the character cards I'm used to. Just this makes it worth it imo. Also the writing is kinda nice.
+ overrides:
+ parameters:
+ model: Nohobby_L3.3-Prikol-70B-v0.5-Q4_K_M.gguf
+ files:
+ - filename: Nohobby_L3.3-Prikol-70B-v0.5-Q4_K_M.gguf
+ sha256: 36f29015f1f420f51569603445a3ea5fe72e3651c2022ef064086f5617578fe6
+ uri: huggingface://bartowski/Nohobby_L3.3-Prikol-70B-v0.5-GGUF/Nohobby_L3.3-Prikol-70B-v0.5-Q4_K_M.gguf
- &rwkv
url: "github:mudler/LocalAI/gallery/rwkv.yaml@master"
name: "rwkv-6-world-7b"
From cc163429dc3ea027d9a6b6578757e942fcb62ce1 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 7 Feb 2025 13:31:49 +0100
Subject: [PATCH 163/679] chore(model gallery): add
cognitivecomputations_dolphin3.0-r1-mistral-24b (#4778)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 5bde3e85..5af8f895 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -6913,6 +6913,22 @@
- filename: krutrim-ai-labs_Krutrim-2-instruct-Q4_K_M.gguf
sha256: 03aa6d1fb7ab70482a2242839b8d8e1c789aa90a8be415076ddf84bef65f06c7
uri: huggingface://bartowski/krutrim-ai-labs_Krutrim-2-instruct-GGUF/krutrim-ai-labs_Krutrim-2-instruct-Q4_K_M.gguf
+- !!merge <<: *mistral03
+ name: "cognitivecomputations_dolphin3.0-r1-mistral-24b"
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/hdAvdwZiJaLbGmvSZ3wTT.png
+ urls:
+ - https://huggingface.co/cognitivecomputations/Dolphin3.0-R1-Mistral-24B
+ - https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF
+ description: |
+ Dolphin 3.0 R1 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.
+ overrides:
+ parameters:
+ model: cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf
+ files:
+ - filename: cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf
+ sha256: d67de1e94fb32742bd09ee8beebbeb36a4b544785a8f8413dc4d9490e04eda6c
+ uri: huggingface://bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf
- &mudler
url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
name: "LocalAI-llama3-8b-function-call-v0.2"
From 230fe0098faeca88a6ab4ddcba8e70ce0794ea86 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Fri, 7 Feb 2025 13:33:24 +0100
Subject: [PATCH 164/679] chore(model gallery): add
cognitivecomputations_dolphin3.0-mistral-24b (#4779)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 5af8f895..3e0c1ac6 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -6929,6 +6929,22 @@
- filename: cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf
sha256: d67de1e94fb32742bd09ee8beebbeb36a4b544785a8f8413dc4d9490e04eda6c
uri: huggingface://bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf
+- !!merge <<: *mistral03
+ name: "cognitivecomputations_dolphin3.0-mistral-24b"
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+ icon: https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/cNCs1TBD3FelWCJGkZ3cd.png
+ urls:
+ - https://huggingface.co/cognitivecomputations/Dolphin3.0-Mistral-24B
+ - https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-Mistral-24B-GGUF
+ description: |
+ Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.
+ overrides:
+ parameters:
+ model: cognitivecomputations_Dolphin3.0-Mistral-24B-Q4_K_M.gguf
+ files:
+ - filename: cognitivecomputations_Dolphin3.0-Mistral-24B-Q4_K_M.gguf
+ sha256: 6f193bbf98628140194df257c7466e2c6f80a7ef70a6ebae26c53b2f2ef21994
+ uri: huggingface://bartowski/cognitivecomputations_Dolphin3.0-Mistral-24B-GGUF/cognitivecomputations_Dolphin3.0-Mistral-24B-Q4_K_M.gguf
- &mudler
url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
name: "LocalAI-llama3-8b-function-call-v0.2"
From 4b1b942a7f747755fe3e45bead662eeb96db3959 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto
Date: Sat, 8 Feb 2025 09:04:18 +0100
Subject: [PATCH 165/679] chore(model gallery): add
sicariussicariistuff_redemption_wind_24b (#4781)
Signed-off-by: Ettore Di Giacinto
---
gallery/index.yaml | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index 3e0c1ac6..4b61a0e3 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -6945,6 +6945,28 @@
- filename: cognitivecomputations_Dolphin3.0-Mistral-24B-Q4_K_M.gguf
sha256: 6f193bbf98628140194df257c7466e2c6f80a7ef70a6ebae26c53b2f2ef21994
uri: huggingface://bartowski/cognitivecomputations_Dolphin3.0-Mistral-24B-GGUF/cognitivecomputations_Dolphin3.0-Mistral-24B-Q4_K_M.gguf
+- !!merge <<: *mistral03
+ name: "sicariussicariistuff_redemption_wind_24b"
+ url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
+ icon: https://huggingface.co/SicariusSicariiStuff/Redemption_Wind_24B/resolve/main/Images/Redemption_Wind_24B.png
+ urls:
+ - https://huggingface.co/SicariusSicariiStuff/Redemption_Wind_24B
+ - https://huggingface.co/bartowski/SicariusSicariiStuff_Redemption_Wind_24B-GGUF
+ description: |
+ This is a lightly fine-tuned version of the Mistral 24B base model, designed as an accessible and adaptable foundation for further fine-tuning and merging fodder. Key modifications include:
+ ChatML-ified, with no additional tokens introduced.
+ High quality private instructānot generated by ChatGPT or Claude, ensuring no slop and good markdown understanding.
+ No refusalsāsince itās a base model, refusals should be minimal to non-existent, though, in early testing, occasional warnings still appear (I assume some were baked into the pre-train).
+ High-quality private creative writing dataset Mainly to dilute baked-in slop further, but it can actually write some stories, not bad for loss ~8.
+ Small, high-quality private RP dataset This was done so further tuning for RP will be easier. The dataset was kept small and contains ZERO SLOP, some entries are of 16k token length.
+ Exceptional adherence to character cards This was done to make it easier for further tunes intended for roleplay.
+ overrides:
+ parameters:
+ model: SicariusSicariiStuff_Redemption_Wind_24B-Q4_K_M.gguf
+ files:
+ - filename: SicariusSicariiStuff_Redemption_Wind_24B-Q4_K_M.gguf
+ sha256: 40025eb00d83c9e9393555962962a2dfc5251fe7bd70812835ff0bcc55ecc463
+ uri: huggingface://bartowski/SicariusSicariiStuff_Redemption_Wind_24B-GGUF/SicariusSicariiStuff_Redemption_Wind_24B-Q4_K_M.gguf
- &mudler
url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
name: "LocalAI-llama3-8b-function-call-v0.2"
From 7a5912908a6c8ae2791ddc6d5a733181ae02828a Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 8 Feb 2025 09:44:34 +0100
Subject: [PATCH 166/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`d2fe216fb2fb7ca8627618c9ea3a2e7886325780` (#4780)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index a1224035..01d5a14a 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ BINARY_NAME=local-ai
DETECT_LIBS?=true
# llama.cpp versions
-CPPLLAMA_VERSION?=8a59053f63fffc24e730cd3ea067760abfe4a919
+CPPLLAMA_VERSION?=d2fe216fb2fb7ca8627618c9ea3a2e7886325780
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From e01acc88c984c60b5a3e60bb1e12d4e232a20f6c Mon Sep 17 00:00:00 2001
From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com>
Date: Sat, 8 Feb 2025 22:57:40 +0100
Subject: [PATCH 167/679] chore: :arrow_up: Update ggerganov/llama.cpp to
`e6e658319952f7ad269dc11275b9edddc721fc6d` (#4787)
:arrow_up: Update ggerganov/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 01d5a14a..05fa4a44 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ BINARY_NAME=local-ai
DETECT_LIBS?=true
# llama.cpp versions
-CPPLLAMA_VERSION?=d2fe216fb2fb7ca8627618c9ea3a2e7886325780
+CPPLLAMA_VERSION?=e6e658319952f7ad269dc11275b9edddc721fc6d
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggerganov/whisper.cpp
From fb2f847507268daf4fbba106e3d73b1f09314b37 Mon Sep 17 00:00:00 2001
From: Dave
Date: Sun, 9 Feb 2025 04:52:28 -0500
Subject: [PATCH 168/679] chore: migrate bruno request files to examples repo
(#4788)
migrate bruno request files to examples repo
Signed-off-by: Dave Lee
---
.../Sound Generation/musicgen.bru | 23 --------------
.../backend monitor/backend monitor.bru | 17 ----------
.../backend monitor/backend-shutdown.bru | 21 ------------
.bruno/LocalAI Test Requests/bruno.json | 5 ---
.../environments/localhost.bru | 6 ----
.../LocalAI Test Requests/get models list.bru | 11 -------
.../image generation/Generate image.bru | 25 ---------------
.../llm text/-completions.bru | 24 --------------
.../LocalAI Test Requests/llm text/-edits.bru | 23 --------------
.../llm text/-embeddings.bru | 22 -------------
.../chat completion -simple- 1 message-.bru | 30 ------------------
.../llm text/chat/chat-completions -long-.bru | 29 -----------------
.../chat/chat-completions -stream-.bru | 25 ---------------
.../model gallery/add model gallery.bru | 22 -------------
.../model gallery/delete model gallery.bru | 21 ------------
.../list MODELS in galleries.bru | 11 -------
.../model gallery/list model GALLERIES.bru | 11 -------
.../model gallery/model delete.bru | 11 -------
.../model gallery apply -gist-.bru | 21 ------------
.../model gallery/model gallery apply.bru | 22 -------------
.../transcription/gb1.ogg | Bin 1667662 -> 0 bytes
.../transcription/transcribe.bru | 16 ----------
.bruno/LocalAI Test Requests/tts/-tts.bru | 22 -------------
.bruno/LocalAI Test Requests/tts/musicgen.bru | 23 --------------
24 files changed, 441 deletions(-)
delete mode 100644 .bruno/LocalAI Test Requests/Sound Generation/musicgen.bru
delete mode 100644 .bruno/LocalAI Test Requests/backend monitor/backend monitor.bru
delete mode 100644 .bruno/LocalAI Test Requests/backend monitor/backend-shutdown.bru
delete mode 100644 .bruno/LocalAI Test Requests/bruno.json
delete mode 100644 .bruno/LocalAI Test Requests/environments/localhost.bru
delete mode 100644 .bruno/LocalAI Test Requests/get models list.bru
delete mode 100644 .bruno/LocalAI Test Requests/image generation/Generate image.bru
delete mode 100644 .bruno/LocalAI Test Requests/llm text/-completions.bru
delete mode 100644 .bruno/LocalAI Test Requests/llm text/-edits.bru
delete mode 100644 .bruno/LocalAI Test Requests/llm text/-embeddings.bru
delete mode 100644 .bruno/LocalAI Test Requests/llm text/chat/chat completion -simple- 1 message-.bru
delete mode 100644 .bruno/LocalAI Test Requests/llm text/chat/chat-completions -long-.bru
delete mode 100644 .bruno/LocalAI Test Requests/llm text/chat/chat-completions -stream-.bru
delete mode 100644 .bruno/LocalAI Test Requests/model gallery/add model gallery.bru
delete mode 100644 .bruno/LocalAI Test Requests/model gallery/delete model gallery.bru
delete mode 100644 .bruno/LocalAI Test Requests/model gallery/list MODELS in galleries.bru
delete mode 100644 .bruno/LocalAI Test Requests/model gallery/list model GALLERIES.bru
delete mode 100644 .bruno/LocalAI Test Requests/model gallery/model delete.bru
delete mode 100644 .bruno/LocalAI Test Requests/model gallery/model gallery apply -gist-.bru
delete mode 100644 .bruno/LocalAI Test Requests/model gallery/model gallery apply.bru
delete mode 100644 .bruno/LocalAI Test Requests/transcription/gb1.ogg
delete mode 100644 .bruno/LocalAI Test Requests/transcription/transcribe.bru
delete mode 100644 .bruno/LocalAI Test Requests/tts/-tts.bru
delete mode 100644 .bruno/LocalAI Test Requests/tts/musicgen.bru
diff --git a/.bruno/LocalAI Test Requests/Sound Generation/musicgen.bru b/.bruno/LocalAI Test Requests/Sound Generation/musicgen.bru
deleted file mode 100644
index 471756f5..00000000
--- a/.bruno/LocalAI Test Requests/Sound Generation/musicgen.bru
+++ /dev/null
@@ -1,23 +0,0 @@
-meta {
- name: musicgen
- type: http
- seq: 1
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/sound-generation
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "model_id": "facebook/musicgen-small",
- "text": "Exciting 80s Newscast Interstitial",
- "duration_seconds": 8
- }
-}
diff --git a/.bruno/LocalAI Test Requests/backend monitor/backend monitor.bru b/.bruno/LocalAI Test Requests/backend monitor/backend monitor.bru
deleted file mode 100644
index 51e3771a..00000000
--- a/.bruno/LocalAI Test Requests/backend monitor/backend monitor.bru
+++ /dev/null
@@ -1,17 +0,0 @@
-meta {
- name: backend monitor
- type: http
- seq: 4
-}
-
-get {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/backend/monitor
- body: json
- auth: none
-}
-
-body:json {
- {
- "model": "{{DEFAULT_MODEL}}"
- }
-}
diff --git a/.bruno/LocalAI Test Requests/backend monitor/backend-shutdown.bru b/.bruno/LocalAI Test Requests/backend monitor/backend-shutdown.bru
deleted file mode 100644
index f75f259a..00000000
--- a/.bruno/LocalAI Test Requests/backend monitor/backend-shutdown.bru
+++ /dev/null
@@ -1,21 +0,0 @@
-meta {
- name: backend-shutdown
- type: http
- seq: 3
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/backend/shutdown
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "model": "{{DEFAULT_MODEL}}"
- }
-}
diff --git a/.bruno/LocalAI Test Requests/bruno.json b/.bruno/LocalAI Test Requests/bruno.json
deleted file mode 100644
index 9491e3a5..00000000
--- a/.bruno/LocalAI Test Requests/bruno.json
+++ /dev/null
@@ -1,5 +0,0 @@
-{
- "version": "1",
- "name": "LocalAI Test Requests",
- "type": "collection"
-}
\ No newline at end of file
diff --git a/.bruno/LocalAI Test Requests/environments/localhost.bru b/.bruno/LocalAI Test Requests/environments/localhost.bru
deleted file mode 100644
index fb97edb2..00000000
--- a/.bruno/LocalAI Test Requests/environments/localhost.bru
+++ /dev/null
@@ -1,6 +0,0 @@
-vars {
- HOST: localhost
- PORT: 8080
- DEFAULT_MODEL: gpt-3.5-turbo
- PROTOCOL: http://
-}
diff --git a/.bruno/LocalAI Test Requests/get models list.bru b/.bruno/LocalAI Test Requests/get models list.bru
deleted file mode 100644
index 4bf1628f..00000000
--- a/.bruno/LocalAI Test Requests/get models list.bru
+++ /dev/null
@@ -1,11 +0,0 @@
-meta {
- name: get models list
- type: http
- seq: 2
-}
-
-get {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models
- body: none
- auth: none
-}
diff --git a/.bruno/LocalAI Test Requests/image generation/Generate image.bru b/.bruno/LocalAI Test Requests/image generation/Generate image.bru
deleted file mode 100644
index 37d350ca..00000000
--- a/.bruno/LocalAI Test Requests/image generation/Generate image.bru
+++ /dev/null
@@ -1,25 +0,0 @@
-meta {
- name: Generate image
- type: http
- seq: 1
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/images/generations
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "prompt": "|",
- "model": "model-name",
- "step": 51,
- "size": "1024x1024",
- "image": ""
- }
-}
diff --git a/.bruno/LocalAI Test Requests/llm text/-completions.bru b/.bruno/LocalAI Test Requests/llm text/-completions.bru
deleted file mode 100644
index 6e16a244..00000000
--- a/.bruno/LocalAI Test Requests/llm text/-completions.bru
+++ /dev/null
@@ -1,24 +0,0 @@
-meta {
- name: -completions
- type: http
- seq: 4
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/completions
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "model": "{{DEFAULT_MODEL}}",
- "prompt": "function downloadFile(string url, string outputPath) {",
- "max_tokens": 256,
- "temperature": 0.5
- }
-}
diff --git a/.bruno/LocalAI Test Requests/llm text/-edits.bru b/.bruno/LocalAI Test Requests/llm text/-edits.bru
deleted file mode 100644
index 838afa27..00000000
--- a/.bruno/LocalAI Test Requests/llm text/-edits.bru
+++ /dev/null
@@ -1,23 +0,0 @@
-meta {
- name: -edits
- type: http
- seq: 5
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/edits
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "model": "{{DEFAULT_MODEL}}",
- "input": "What day of the wek is it?",
- "instruction": "Fix the spelling mistakes"
- }
-}
diff --git a/.bruno/LocalAI Test Requests/llm text/-embeddings.bru b/.bruno/LocalAI Test Requests/llm text/-embeddings.bru
deleted file mode 100644
index a3045df2..00000000
--- a/.bruno/LocalAI Test Requests/llm text/-embeddings.bru
+++ /dev/null
@@ -1,22 +0,0 @@
-meta {
- name: -embeddings
- type: http
- seq: 6
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/embeddings
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "model": "{{DEFAULT_MODEL}}",
- "input": "A STRANGE GAME.\nTHE ONLY WINNING MOVE IS NOT TO PLAY.\n\nHOW ABOUT A NICE GAME OF CHESS?"
- }
-}
diff --git a/.bruno/LocalAI Test Requests/llm text/chat/chat completion -simple- 1 message-.bru b/.bruno/LocalAI Test Requests/llm text/chat/chat completion -simple- 1 message-.bru
deleted file mode 100644
index fd08aeee..00000000
--- a/.bruno/LocalAI Test Requests/llm text/chat/chat completion -simple- 1 message-.bru
+++ /dev/null
@@ -1,30 +0,0 @@
-meta {
- name: chat completion -simple- 1 message-
- type: http
- seq: 4
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "model": "{{DEFAULT_MODEL}}",
- "messages": [
- {
- "role": "user",
- "content": "How could one use friction to cook an egg?"
- }
- ],
- "max_tokens": 256,
- "temperature": 0.2,
- "grammar": ""
- }
-}
diff --git a/.bruno/LocalAI Test Requests/llm text/chat/chat-completions -long-.bru b/.bruno/LocalAI Test Requests/llm text/chat/chat-completions -long-.bru
deleted file mode 100644
index 5d9a49a0..00000000
--- a/.bruno/LocalAI Test Requests/llm text/chat/chat-completions -long-.bru
+++ /dev/null
@@ -1,29 +0,0 @@
-meta {
- name: chat-completions -long-
- type: http
- seq: 5
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "model": "{{DEFAULT_MODEL}}",
- "messages": [{"role": "system", "content": "You are a helpful, intelligent pirate assistant. Always answer as helpfully as possible, while being a pirate and using pirate language. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."},
- {"role": "user", "content": "How could one use electricity to cook an egg?"},
- {"role": "assistant",
- "content": "Shiver me timbers! Using electricity to cook an egg? Well, matey, I reckon that be a right curious idea! *adjusts spectacles* Now, I ain't sure if this be possible, but I'll do me best to help ye out.\n\nFirst things first, ye gotta understand that electricity be a powerful force, and it can't just cook an egg on its own. Ye see, electricity be like a mighty wind that can make things happen, but it needs somethin' to work with. So, if ye want to use electricity to cook an egg, ye gotta find a way to harness that power and make it do the cookin'.\n\nNow, I know what ye might be thinkin': \"How do I harness the power of electricity to cook an egg?\" Well, matey, there be a few ways to do it. One way be to use a special device called an \"electric frying pan.\" This be a pan that has a built-in heating element that gets hot when ye plug it into a wall socket. When the element gets hot, ye can crack an egg into the pan and watch as it cook"
- },
- {"role": "user", "content": "I don't have one of those, just a raw wire and plenty of power! How do we get it done?"}],
- "max_tokens": 1024,
- "temperature": 0.5
- }
-}
diff --git a/.bruno/LocalAI Test Requests/llm text/chat/chat-completions -stream-.bru b/.bruno/LocalAI Test Requests/llm text/chat/chat-completions -stream-.bru
deleted file mode 100644
index cf0539fa..00000000
--- a/.bruno/LocalAI Test Requests/llm text/chat/chat-completions -stream-.bru
+++ /dev/null
@@ -1,25 +0,0 @@
-meta {
- name: chat-completions -stream-
- type: http
- seq: 6
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "model": "{{DEFAULT_MODEL}}",
- "messages": [{"role": "user", "content": "Explain how I can set sail on the ocean using only power generated by seagulls?"}],
- "max_tokens": 256,
- "temperature": 0.9,
- "stream": true
- }
-}
diff --git a/.bruno/LocalAI Test Requests/model gallery/add model gallery.bru b/.bruno/LocalAI Test Requests/model gallery/add model gallery.bru
deleted file mode 100644
index 1463160f..00000000
--- a/.bruno/LocalAI Test Requests/model gallery/add model gallery.bru
+++ /dev/null
@@ -1,22 +0,0 @@
-meta {
- name: add model gallery
- type: http
- seq: 10
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "url": "file:///home/dave/projects/model-gallery/huggingface/TheBloke__CodeLlama-7B-Instruct-GGML.yaml",
- "name": "test"
- }
-}
diff --git a/.bruno/LocalAI Test Requests/model gallery/delete model gallery.bru b/.bruno/LocalAI Test Requests/model gallery/delete model gallery.bru
deleted file mode 100644
index 3e211aa6..00000000
--- a/.bruno/LocalAI Test Requests/model gallery/delete model gallery.bru
+++ /dev/null
@@ -1,21 +0,0 @@
-meta {
- name: delete model gallery
- type: http
- seq: 11
-}
-
-delete {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "name": "test"
- }
-}
diff --git a/.bruno/LocalAI Test Requests/model gallery/list MODELS in galleries.bru b/.bruno/LocalAI Test Requests/model gallery/list MODELS in galleries.bru
deleted file mode 100644
index 1d866f8a..00000000
--- a/.bruno/LocalAI Test Requests/model gallery/list MODELS in galleries.bru
+++ /dev/null
@@ -1,11 +0,0 @@
-meta {
- name: list MODELS in galleries
- type: http
- seq: 7
-}
-
-get {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/available
- body: none
- auth: none
-}
diff --git a/.bruno/LocalAI Test Requests/model gallery/list model GALLERIES.bru b/.bruno/LocalAI Test Requests/model gallery/list model GALLERIES.bru
deleted file mode 100644
index f7664672..00000000
--- a/.bruno/LocalAI Test Requests/model gallery/list model GALLERIES.bru
+++ /dev/null
@@ -1,11 +0,0 @@
-meta {
- name: list model GALLERIES
- type: http
- seq: 8
-}
-
-get {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
- body: none
- auth: none
-}
diff --git a/.bruno/LocalAI Test Requests/model gallery/model delete.bru b/.bruno/LocalAI Test Requests/model gallery/model delete.bru
deleted file mode 100644
index b320dae3..00000000
--- a/.bruno/LocalAI Test Requests/model gallery/model delete.bru
+++ /dev/null
@@ -1,11 +0,0 @@
-meta {
- name: model delete
- type: http
- seq: 7
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
- body: none
- auth: none
-}
diff --git a/.bruno/LocalAI Test Requests/model gallery/model gallery apply -gist-.bru b/.bruno/LocalAI Test Requests/model gallery/model gallery apply -gist-.bru
deleted file mode 100644
index d94c75a2..00000000
--- a/.bruno/LocalAI Test Requests/model gallery/model gallery apply -gist-.bru
+++ /dev/null
@@ -1,21 +0,0 @@
-meta {
- name: model gallery apply -gist-
- type: http
- seq: 12
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/apply
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "id": "TheBloke__CodeLlama-7B-Instruct-GGML__codellama-7b-instruct.ggmlv3.Q2_K.bin"
- }
-}
diff --git a/.bruno/LocalAI Test Requests/model gallery/model gallery apply.bru b/.bruno/LocalAI Test Requests/model gallery/model gallery apply.bru
deleted file mode 100644
index aa308e1e..00000000
--- a/.bruno/LocalAI Test Requests/model gallery/model gallery apply.bru
+++ /dev/null
@@ -1,22 +0,0 @@
-meta {
- name: model gallery apply
- type: http
- seq: 9
-}
-
-post {
- url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/apply
- body: json
- auth: none
-}
-
-headers {
- Content-Type: application/json
-}
-
-body:json {
- {
- "id": "dave@TheBloke__CodeLlama-7B-Instruct-GGML__codellama-7b-instruct.ggmlv3.Q3_K_S.bin",
- "name": "codellama7b"
- }
-}
diff --git a/.bruno/LocalAI Test Requests/transcription/gb1.ogg b/.bruno/LocalAI Test Requests/transcription/gb1.ogg
deleted file mode 100644
index df22d6363731c9867e4817a45f85a89436844159..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 1667662
zcma&Nd0bN2`#65x(Uefpz_b7n#ibD>BdtaO6%b8vNv#}3T+>QZ+vn{b7#ac^keUnP
zl9p>_%Z$0;lA3#FHcf)*gFDkwwvehbPNx*AL6xu~}GBci-Ix%rgA%rRjQu*RS;-{Y6U
zaCljIq#ChjIC%*LLeEjL!^yfhr6{G4@D~=Ef?uP{gnQ*D*CIejVGcrdYfuv5
znkZ#-I|#i%onfb3-nxcG#6nZntw~X>$$PyfS+Mom)L(R8iN98YHSu4egM^o+M=q~P
z0h{K3okNkhCV&tCDk^RApV>fRX03X_006+;lHp!OaPJoLyW>~`h(V~Q9X(G$
zwt7jEOJDov5;rSJ_@_jUQ^YzuWry#+I+_bO7-4hxZ+Z>wM+o0sth!Ccm0oxELa;
z2`$&j$!k;xG5|mmq|^6mf&09}-h02}-Y@^}R69ouB4UzWvJtFaU^zlX;o_SO_|Y1GlCnD^Vcy
z92+updEmdFFq03q-Cty}1V)>y?7_5@82L_6+fkyKQTS&lJtaBxpJ)>Hf#0?>;8|G#&lZ
z$LNRC(fdoHA9p0}pV`0ve%k-@_}cXhO(3Ii5!5<(&0u<0S3ghkgFeX#snZ{%Zt~
zWUcfc*#RDdpejH>yn``G^7&sLYl>9|KRfSQS6E&su#3t!&(<;iiF2d`>@UaoU5`=vV^oiBeW
z!|gL^(A%5maG=~;?zg|rHgRLC!QT=A;NCBRAa+mgYf8z^djUapwt*Wi>xXLp7oYL+
z_+=2=Kq*1rEPq+QC2&K7UTB`3#((i$#vXg~R@Sh>=-u$>jUaEH9kt}I2niBB;a53O
zzhVb2TjyUj_=2Ky_*R@cY#C3_^8*98G`>_Ym28Pl6n@aZ==(m~!hiuK|UD
zk2uIp3JNKCdA!_44xY5NzxxZ#AMO(n6l4NYf({3syu1#i^oM&)1_s^FReq&xxN-Tl
z`WiEMDsBINJN!TYU>9|uV1)7QsygP1M}LG!UWBVG&{^A~q9m5CB$I
zS+FqM`e=68fu;@?Goh&?gn@4=o+0A#S7*X-Rvj~8b^x#f_G_h!VngCu0YV+X4ORt@
z5^VRv2jzp&v|3`@Cp|Li&nyOGEd*b~1fyw=gsobOt4w?p7+g^`Q)^+ikj<c^Lqzg{d-w&<0(K-P&vLZ&tbFkVJT1w9^Vm+UR-M?w1ngn
zAT{l#Q7!N&7&G2MRmFU%P)vmhJZeT6jQT#rSpu#GY<8GN1v2Y95C~vr}Mb42^0e+*b!+X68T_G%>{DsUpt4c}!RgxNQQXwSc`AK6Cu=
zpgy6LIyk6L9W-|>i)F)zDo}y$5MtJnfY1zu
zQ9m6Fde@Ac?cFr9=4Ue{Y$oVfYmnJqu=)zkgf(tesROJ{=0Jr}Vrit+a^qpr^{;w3
z__rQjfZbhlq^ULMTI&^feF(l-06JXiUpb1+wD^jU7%_Eg6jLssCxVE|95$1IPy5ON
zx_k}j{jb(IY|tP=*9tU0#45*+X;Ud-GZXMNAV&>|0O?9NU~fT0jUgl`h}l&06#@}8HB)RRh-8BlflMYSk8KAwnYLEN
zC2a7z;5BL_5;nsDr1Nf4fcLzH^#&i8u&*4pV7(@5iv1OV2Y{uo_T(|wwM+`^&UG2)
z>Kpr2qV3c&0zvQpJ86H--2ba;?C-JA-{YgNs{UV8z5snso<_unoPR!F5b(K%Wfc;X
z*tL7M2pDD9)a0ynFaUJ2itMi{@dYjOwL_8#_y1}fw1>rCngpS;Y`-;?fW%f3*xyi1
z4)`Sgr3VlJTIQ=LHgg&D!2go`|3H5K5P<*t4*tul1)x010BQ@u-5MEW8d=nyhC+WE|7PcRQ~IhK=7r
zR3BLN0Iiq>#&NV7pbgQB0W$8Wbq{2^c>$T!jX(s75~H3?i90OR1=hn1*=lZ_tg6(?
zbrc?Izh>x;!%uq)tPheasT;vzDp(-EDFDF2-Do7IBPOOB5J=!NUlZupw%UJRGaSESfwk9+
zHDTK?-Jjg3W-YE4@c%Vf-(25tskybSt+BePrM{`Db|4p!pZ_fv0&IFu{1K+sr2#~!
zyTP1^XH)&Nyu^MqSVSg!}#*V#qI`Hn#
zx7?ZcZ~x_VpzqHgXTtvT=Yg59H-EOy4E_0~aqC)((=_dSW@*c(KdsqW#j!C>%`+);
zJ89=XUu5vZV~+#r*1xm=W!}|Ow&eXK8CVaux@BZogvDsJ7mN3d#LWd4$e=z*CukWL
z(GWOVlV5~ryw#6zT}^aBunLxkrK%~?!WeorE+K*Gx)o|cpR6?2IDP6-Qcd{S&gu&v
z{XJjwt?ZxoUU@v^y>h$%c7N)j!IR@45Ou@rz3-=k-+a^ZX_Mz3bm82?+dI-fE#B^*
zzxnd<^qW)P)V9?g{pOm$p?`AF|6{3kz4~Ga_fnS12DRUYV%9-#ChDC)Ot@2@dwdbN+-w
zas+~{1WF{DT9;6Z7E$x@&Pv>Y+T>CTQF5?CW>C7@$b6Eg7FRpDJgYs?32j(-@ngqr
z4JQBJnjcJmXui1Wa4PcKhZ{_Ux-!~RGK$@mqLu^WQP7^;y{#5LJUn_+9^DP0>gh_h
z#%kvmsj9TIB{@PH-E^D+I_d>?xhhzQFt076Q=!=s70gX((f@!&w3GK`jxD1kvC(&%ioaR
zi19|VMuufE5rnjn5{gtVa~kO~MPEHfgj7;wc$Rv*K?VJpB62=5B;%M6aZXd36K9VW
zp%_I=2y?jk$}vry7GBmEtykZy$flz_p3Y5>`^pU48mUH$ME4df>we-(k5^C0fo@
zMvqt+kNY1lYnjAV=GN_q?+>+Ai7KO8qrc#B|31+zTXEM0Vwu1uUf3-Y7oX(anRx9g
z(?hww0gnPA@||F~{PV1x@pXYl8P}_dx(Yo_G|`b;^sF8rCaSoSUZEEu!?>TsDIVu0
z7H}K#;-uKsz-v~yr}L@1PwqT_{9nO;Mz<=PHc`#JR}J(&n$YO*N(Jjg=>^y1ila82
z2QF&SF8W+>hq-NQuQZh?nC#S!K)9-PX@7t0fyo=j1>P^I1;6)hzqO+GN#WCb{D$Z0XH$z-0m@R_
zWOX;fkZPa3b3r;gB;?7dtj1GjzjV^CqS|{{{mBD}c=_N2y;-0E+BG|RNUZ6Rt89dB)8F&dCjewkmyPJOGTyXl3uVLUP3tFT}o
z%sRqn)ujKql}Grss^a;bKO{ZwzR{ZX1Xia%nNo5z1}K+4=xt4xO9_`(58dt^#+7tT
z_6m}d{`GnU^#R||U|VxkItM(9tjnQSbw$$c%eggzbr|IHCMnt9ma~b%a#yl_Tx9~P
z4qPWXv^X_7x8>X?O^V;+eyK^##&VqI`=!f|cDXI@>Y}z}JP!DG;4$}a#5$tro2X85ffn$`(dEQq
zCbz-CCi(uC{VscJjARMxn2*xz5cm*SxNU8(z~~5I)J+$ntUNK1B(ogDvyMFLUY;h-
zk832Gu8fLJs7tjeVz_Pt%R_wTw9ki)!=wdfU)xbU~dNXD?;f@80RW+aK;
zt5%59tBtqS8;mJihx9XB7j;;m&OU0M7ff6pBQSK3sS|wKr(pZZA(4}EI+o@y+;S=w{m{mCA9A-f*alHkvVbC3w$)yL_g<`VPNAid
zF{WH!Cgi*KS5DvYi(=_7%wpPzh=LwJk6HJDmM3q5s2cGy-vjHbDP&f(e^7sIiF^Whq-
z{oy|i(_Q!^qj*t{IuKF|{BXvs>&iO!?Im`t%7;{gC--wJ%7sPd)!)ns`WD(%10vwdG*@4y#3J*)paJ!Xwl)zuy0f@ZyG+&P(r7TYu%J=
zulZ!#+;WzDoTOqoIosQDEgG+;FrR5uuhU%$(6^5GMhzJ2SD*KqJMgV-sb0vk?`QhS
zRj=FtHM_^l`rH`oTdRP;#BiPP@a(hKSGneSxHgaC^A>PJ
zyqgoTUY40Uu#{=jlVEFMg|OqeT%lt}eaFAqasOM_^AGJde2_rZwg>j&*vsRDYO>kk
ztIjc3N$1oa*D=`v`13V3lQ&Xblt^XIN`;uPvOZ@_1Ea^<^`EB;POzg0ms`7WK+BkVr!~rwF|9yy(p0zWyh;HtPd)NV=pv*
z-q$!XRB$SVayZ<-T+h%Ffc6WQTCpjpBseuLAY*2{j@i~z{
zNLhv{4DBo`)FU^rz9teO;1Bk??8=wWI{S3;c{qb1{dl!vJ6hX!?iYVJVu%T-5SbE+
zW2&&*Lg!lJqM2Kha0zlx#6lybwzDIIjJ0=oHxg(oD~k6U`+XSWj5`Q!F^~B;a96K4&gyEZh|>x
zqh`K}McDy2EENJm1KBBx`#kKI#O2#5NbSD;}+h6%LZ9POk0K+UB&oXa99=`Ol-9f6h`0ll6dIA#?Gy<=2z9gx*2u
zc)PAn8E+f4vO`1}c$qy9WLR|12=R)#w%kH)00o>YH;IvO+lDL;#1Jj4QxUa324eb|
zajqk$$}lQ+0*Z9__#fjF`L`aoeR)Evg=EIbBh3V~v(m!0RBXvkJ+HTSzYoAIOWlpD
ztna*f(RimvM6Jm#PWQOsBK#?8{XrCDv(5gtbng>kWsZp4Cyuu^49^80zHbU-P{HZE
z`K8V)kB~v@FX^w(i>rS5LrQGm&PL)6L{-NL_8~=}6>b(Kb`ZgFp@%c`;b|@TDbKJM
z?_n3;vn7N5Bu`DuBHe^fV*xc$#NimZVXO5OpG`leKfc=d@+>rWPlE5IQ=S>|nP$G)
z`|;^f{(*fcPgOjv#>ehPI%lPKE^X0vfw}SX*pa(J%`=>%iO(_LjJNF8u|VHzLJt4_
z`uVrwWW{0u%%<2_Tl7HFQtkWe$VBtY-etk`@2*3)Wf%0M)lcdzue;w$uy4LF?_VD-
zYY5z!C*|nAa}*(lT0#PKllL0-W%ttSj$t`IP9rpfV!|l%{j1c3uxzA8K+hscEc0Y+
zH}HAMfVJH{_@$>hF&EVx?sD2G4JFw7vnbup
z1K(M=i#<5C%h&C#xT?B0HdQ0>Pa7VI+2wPPJ>k}mv<1H3v#j*JU$xIx>-hZ}H<{bg
zZkRr2;|Ffe_zbWHFEQ~!$a(-~X;eUiGa!6JEKV4qN2j1=w_5qhF1|5&3{QGWC?nG*
zi!c#7SxC(zQ2X!f??2s2zVl85S?tnKCiHEx{~R{xKOOPB@b{`~St(P8K7D!^&x!c4
z@XIc2!t=DY(G0BNx>wih!%832ml|R)CksENIYsXK4{+~6p9D(4t2fwbq
zsQTeV@F_XIedYJ`BmUp*wA(c@71nd9;X@@0fH2&DK!{FVMX9Pf6UP&02zvvy9*_(?
z^qexH8nFIbHgK3Mv3W~ZA66@+onD4R<2|)HNf-FXt3<3VLC%XtR(xY0C*$m8A=JW2
z)RN1ROO#07a)y4ey@xoA+&~V-x_=NIlnBqB#+|N1Rr#4Wl1{!1Qvg=FqWP+}Hw6xU
zEd3LwPwfcCDyj0Z3d3ay&1JTOaKeCObHVh5Te31I`?LnI0w4MK7%yLaxWP8ZO}$io
zCToArQ?1@OFx}1)RDgRNpQktm3mA46B69rUd@G!iu(`TPNbgO&>Ucp_2jlFzcrVsM
zFP|$)-1Qx49T6qct)D^pv<*GC8R&&n2%;W7NEg-n8^GXMt;cKO*Xd?yVxi
zYRHJdos%$((EDqFG$RarEd$Hi%*EQmfz?yf=VrM>tP2o>WJ>RDpy4mrxnPkhMp`Ms%7LBe
z9_fz#MkAxLr+tBzMuZNK1yr16HfwlMOcL*=>Sc^`#|FtH)gVlb5{cVfYIC5LF>GR+
z!;-Y>c8)@oU^Kh{H7K*7T6@Y|jIn(P>hJ=5GhlWs)N4JZ4kEDqxwUXpj-Mf7nizT2
zr)Q|X0Rw;ip?-`ukU*gHXdd|r^*#*tIh7M)OMr1mL3UO9TTra2KemtIu~2Dpmp$B
zMX+cgaM8K!0Xn_=aQtqG@&
z^&v3sSquKwn(t80j=|355rpa<#??hAqrqNqScbe{=425oOqNt)C!xr8?6i`b8hY(<
z*im4lJg?+Yk(u;X9oZN&ns-nZP%*Y@I-V-!>We);PBS0hKKb0^E!udmjX~OsdMhqcEmxpcaO)*@q{z2GRzIZqL)%-SZirH;41WB)iR`rD4(Q
zOsY(ui#?Ib;0KffH^>#+=}QfH5!p6+JX>EZYLlgC&`>XhU&cKPH>e+2afB{N1bMnX
z2~vT$FexwD5k8qWaY;CtU{AK_+2l*iZ1q?woTjy16B9j~lW%@HcN72Udi6(Z;L0vCZUObGK%of^>P8(xA@
z-0kUXt8{QJnjyaziPtvXxZns|z+#nK&pseTNgHre74_Z16GQ(gJ?oDtbsTRf9AoL6
zlvUte3UjFT2lDrcf7gc=W{z$w;dLj;o%nOHG0P0fTzr9&EmhF+7UnKm<#l{(=_06u
z%5Xq;;vcmhm=7O@0O2}d<)&J=^=s*bfEN-L*Xlb%s6iUGw2yJYkqt8_2$PsCyf-VG
zN_QSv)Qj51Xl@M^Wns^AnOgPY2`xP0G6m<%tstRLri
z=sK>m0KYc>5vT03wm7l{0W4v`(d>on^-PX%c<%1V9UUtIJtpW1TefHGD{=&cGPTV&1xD3_SnN|_x&5W
znF)jY*FksU7yo^~{0Yhzfw_9=?(HeKlsnI4NTV_7o-iFAvj
zX*|V_`dgC+U)LbDqvS}&d-V*4TF~{3^*_Xhb$PL-iGr%ir{(c#n+ZX89rN?@wtniB
zN<0flST~$4o)RVuh{Orm*p8?oik^`XBeVB6oBlZO@x@=wca|Bhl7~l0GG{TNj{i7h
z$CmYWmtG%_zU|5wzkAc`kZ)|)AClRl9%n9(toGduzBm!i{V~556O2l=6%HwqGwvs|
zBObTa&DI87Ap72!W>iK^M%u2}V&0}6W?iwIcYr*OG$Q8cS88=>pmelLj)q>JWKT^;
zR)^oZb+@~$r0OzsUxxn#pvO
zDlwI?c=#}Vz>jnhJAn2?+9el}Oslgp_Epe4NcGrI3e+e@K!~xP_ZBnQLB8)
zvuNB$hcGA6E?6HTKUq{CIS`jv>zRN^ok%bBjZuw`f-ANwaud?#`K@+wyY{uH&-Oh~
z1M@M@s34Cg4B1hDu0G|g+e;D+8bnA1NyNe)Zt2!2xs2tsOhU5N>pif^;aX@PD-KGe
z`D}&kgountPp3~Lin3v)VXcX3Y=PcTKUM?i6#7)VQESDFhtQsO<0ztgfn2UXUaxVy
zR_{R_2~JxZK&&9V<&Lf@5sr1y1T)}pfX4Z6uyr*I1LVwpwtR<6ihNo1*B!3MET7I3{{hHfaDW3-5ZQ^Sj}n!C{|
z9G?)s3sBTIVa+VXdlE8D7UB@xbeA%jepFg1sEXguJh)#jqHOqP_x$01Nl&lh@u~0L
zFB)7^B!%KXcr=6;oEUv)=A1#NBWS$bcOfusFq0NMrE^F;t(H2YY?<#Fa9)Nm%BqRF
zJd2TOx(cO2ANu=Fmb*WL+k~p&JN*_R$zug4yZE>&h^a*Ri$X-SC<=Mc|CuC=z2WxF
z4Tey;exx%nGtOqUVaNvy8}j3g%xK4cd^vSAJ|){sE+-WLkb-#ee}dxMGw-D8KOR5q
zm^QG*5u7rRAxySPn3FQNlvO(d0Vl$chu?d?{=vtp|4K9Q$zs9G=?l!lpGZrGJbfG3
zqJ{`ZZS!;1SWT&;yE_FT@xwBz-xKEI(>cx)kwk=*i|f)@p+y5#%Uvs8v$;ybtSpS4
z-+kia>HZsGM;ZNrySg(;G1_m$f4{6ikZxHy$TwFOA0n7Ya&R|{~a
z#t~}lPGo&FuzNM-6OIZ$t>0=?L`yEwsT5MVyH|C>PFy~u^Z4{@74z3We!nIU{vhmC
zFyhIVrgiq)y%_JyJVG6aTL0#vHT%w;SvR9zZsGdUV8TNapfva=y0Z2LkW-bawks~o
z=_L$3QnRI#ovqw@1Zp_3@0G}Is}=uewpINHTSl`Qm|698O_(jmO4U7T;c3o6m6qkm
znn{8$8!z_p7BRje3~ymclkCgO#Hh*bkGBicG|Z8Q+jQq
z?DUU~w|O5oDQ!c^t1(DhlSe=-8-I%lCY440}4T*ZT9}9%#(N
zk>@s>FY0Fu7sl(&?+KG!)3MUDdnNIB^bh!!n`R_VLrMAUp(dk7CZ^rl9oeTfj`abiUw(o9A849Q8
z%Wyp920!Y}NSOF;n41|cQyDfqbm&hVffH40Q(f34L8k3kXs-GHK>IAu(_T&cZ!9t>
zUCdXGUPt9%w6qqTz;$nrGH*(tXfQoy2`r0jH>U}GlJ5pl43k8?4azZXq{;t+T;y~_H4O!
z+d2Nqg)wgp{4LiTb6neE=#!a&;az@@_P!&E@hEYNhuX~fxv1Ulz9LMU=d$-|OD)Qp
zQ50I;=z}?b7%~S-pYp4&OehsP`%$o_Doo^3PDqwfk6&aUzeV#5zs2SqeVA~T=Ap=m
z+hO#AYLnlV%eIj)d=As~Ot`UG``{ueRj3p$s_ww5(MX(eR16N;-;yva>_Yh2ydsa_
zwU(Sq#ABvKXyfgGu&*n;LC0M*79S%2}W`lqN?36
zGd&$5gJOIPTAknQW~=-xFjRNe1yV&}WAm_by{b~`L(y0?dk}e~V5NBrXIKIpGtsRo
z^w0-Va0}SM7yQm{M3FINSn2O;ZgNB)X&_TLWhawHOdW<_ic7^dnRNqPZX;CtDJ*O7
zrsIgGD_xIi^fEaTTPBq%09JL_PdGBI~6V(?d_v&D7d2+@>_9(^gcMh<
z&JKmFl5lxbA%on~3RNIWn}7_V#jeu_y5PVL^bh4Pbl?3-GM2e*L}`4aXPngqXhJFM
zT2_XmT#5*y>=1Z@&xvOB7zz+SiG!VE8b#7+nN>+deNm(vu9HJ_kCzuPtD7;tqAc~I
zv>9iDOGnoGacw_D-qi)N%y(6~m_vCLLUIuFc@APUv5?60e1V0_5VsVYgya&73PyBvUE!R0(
zh?C9UO!(xD&$diHNRNvVw2A8paPSqmp?G$@#7na0-suhhfR7eMYqk2hEA&)3&b5wy
zXFttHKIPv9Q0AD{J5cCz8_nc?OWnd=IKP)Ooq3mxBg;Nz)XtY}Vo^?w@UE^j@#>(GZJ2-`2m#_;L%fjs1y
z>s=9cru-*y6doqh+|JV+UIS}NF0DsT`Vp3GCLMiM=}Y7#q4S;8DuKedp}|<}9OrL!
zqT;>)6NZR5nuthS8XI8I7BW!zlzIU@y!nbcjfY#z=A*~;SrNJ9bpWKIP*Z=BhMOLS
zQfr+mT{)pSKtP_I{ty{fO=@2@Tbi`qgra2IX=unae3%6{<_r{*ZEVe9BHN#Y8;D
zpTm33PSu||s6A6gWMuhSdD#Y>A*7)=);2Ny&-m#Sgw2SX2-i2nkn=aGWGynN97?@F
zaN(!MIlTb)xrTcmJ{8ztwIijj=nXE9GV
zFoUsU<@-;s#1{W-+}1V*EY0yjHN^T?2*@*%~dUv
zXe%c%L8MZMee;G;jCkwpZAX6$KH7Y9qiFeJD4+h?{@%&sv2PkT~x&B}$&G10j
z`s94k!J4S;JnDrudF{p<7+MFveIxPKr?j+}COk$1_9rBs#}85!fKs7@ZDm6o;Gt&=QLG+i~jphlH;
zo7?Pfv0Gc4u^07U+ne2xN5*8Ox)hE?{FZjHdZK<{tf*yXOLq!Z5i-PW8F0Z??z37R
zu3v99`|`$Ty0G70=ubez;vD)rX&DCMuap)J_t6
z@ywUD7V!eiQ_@q)wSsl&tAztycZRgQkcm1=TMejZ;z_XMTLR)Kv+JD1^Bs*|8!AvH
zJ%+EfHxr$_jM06%4BLFJmPpF(k$O&L*lvEQx9b%pc+}+WPcDoZ~D1m
z{^aNxQ9o=$mLr4%6;?tXG*Wlfa-2E?<)g~gIO13&tcNCwjH@j}BCG^uG*?*tXBg*b
z(XI2)zM>NvLMU5wTZqA%W8-i=EJyu{ySw@hf;)k@i5oLi-QP=dxT0k#p|mLD@<}y?
zC_aB?cvguY%-C@!y)r&*fu%jingEJuM+I~dsr69Re$~PskFSNFxx+>1$YkTb@aI6C
z#-@ke0~QCLeyNB2@gyuFYA>R*$KlquCU-XPTel+ezSw0Rso8b1`1oZdc|v2Bpam{t
z0E>QV1Z&%8bWZO(+Bzyp5CV13a#u|1kC0i`1uPxIN-<%7_0m)F`HBgCVZ7yWOYgh~
z(I;}mV}{n7Uu~^#uKlGdYo4!l9-%w6iXK)4wFHhZWzfB8(pQeY1=*wx01(a;NhO;j=_Iz{
zM(3@jT6MSf_xtbk$X?xl#aLXkJ`G_>uTLq8@CC;jI_N_8-Jo
zI)NJ^vyQ>`z>dV>jqDdtIW%ku>R|~hYn5H$TKYki8Nnv7>f;%CXv4-^Z#12X^u?&|
zgs5CmWN^HtCWq~&&4y75BiB32chNa*Jotw@c^y5cl(Z=n{wQua~P5zqIU-kqe_}h95m=g
z{&)05n&LPkeGQh^UB6@H*l_F%L=pVXkS+ni{VO?
zloAckz={C!aZ})r`U%m{wWI%8pF4l*_NVU~+7<6tE&ildZu{=-?Q;?b6|MpFPb>v7y#hKX-CIBc}79caGSqXnoc
zZR&N&TU`mrha@gyuV1D3!?9sx;F&B)JU~CG({w*`B3VctaHtWR0^j3X={D=Hy?>@H
zex3APzP-^