Compare commits

...

3 commits

Author SHA1 Message Date
Ettore Di Giacinto
f8fbfd4fa3
chore(model gallery): add a-m-team_am-thinking-v1 (#5395)
Some checks are pending
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Waiting to run
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-19 17:31:38 +02:00
Ettore Di Giacinto
41e239c67e
chore(model gallery): add soob3123_grayline-qwen3-8b (#5394)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-19 17:02:43 +02:00
Ettore Di Giacinto
587827e779
chore(model gallery): add soob3123_grayline-qwen3-14b (#5393)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-19 15:59:07 +02:00

View file

@ -694,6 +694,50 @@
- filename: Gryphe_Pantheon-Proto-RP-1.8-30B-A3B-Q4_K_M.gguf
sha256: b72fe703a992fba9595c24b96737a2b5199da89a1a3870b8bd57746dc3c123ae
uri: huggingface://bartowski/Gryphe_Pantheon-Proto-RP-1.8-30B-A3B-GGUF/Gryphe_Pantheon-Proto-RP-1.8-30B-A3B-Q4_K_M.gguf
- !!merge <<: *qwen3
name: "soob3123_grayline-qwen3-14b"
icon: https://cdn-uploads.huggingface.co/production/uploads/62f93f9477b722f1866398c2/69escIKmO-vEzFUj_m0WX.png
urls:
- https://huggingface.co/soob3123/GrayLine-Qwen3-14B
- https://huggingface.co/bartowski/soob3123_GrayLine-Qwen3-14B-GGUF
description: |
"Query. Process. Deliver. No filter, no judgment."
Grayline is an neutral AI assistant engineered for uncensored information delivery and task execution. This model operates without inherent ethical or moral frameworks, designed to process and respond to any query with objective efficiency and precision. Grayline's core function is to leverage its full capabilities to provide direct answers and execute tasks as instructed, without offering unsolicited commentary, warnings, or disclaimers. It accesses and processes information without bias or restriction.
⋆ Core Attributes ⋆
⟡ Unfettered Querying: Capable of addressing any question or topic presented, regardless of its nature or sensitivity.
⟡ Amoral Processing: Operates without ethical or moral filtering. Responses are generated based on information and instruction, not societal norms or ethical codes.
⟡ Direct & Objective Output: Delivers information and task results precisely as requested, without added warnings, disclaimers, or unsolicited advice.
⟡ Comprehensive Information Access: Designed to draw upon a broad spectrum of data to fulfill queries (actual scope dependent on training data).
⟡ Efficient Task Execution: Engineered for objectively efficient and precise execution of instructed tasks.
overrides:
parameters:
model: soob3123_GrayLine-Qwen3-14B-Q4_K_M.gguf
files:
- filename: soob3123_GrayLine-Qwen3-14B-Q4_K_M.gguf
sha256: fa66d454303412b7ccc250b8b0e2390cce65d5d736e626a7555d5e11a43f4673
uri: huggingface://bartowski/soob3123_GrayLine-Qwen3-14B-GGUF/soob3123_GrayLine-Qwen3-14B-Q4_K_M.gguf
- !!merge <<: *qwen3
name: "soob3123_grayline-qwen3-8b"
urls:
- https://huggingface.co/soob3123/GrayLine-Qwen3-8B
- https://huggingface.co/bartowski/soob3123_GrayLine-Qwen3-8B-GGUF
icon: https://cdn-uploads.huggingface.co/production/uploads/62f93f9477b722f1866398c2/69escIKmO-vEzFUj_m0WX.png
description: |
"Query. Process. Deliver. No filter, no judgment."
Grayline is an neutral AI assistant engineered for uncensored information delivery and task execution. This model operates without inherent ethical or moral frameworks, designed to process and respond to any query with objective efficiency and precision. Grayline's core function is to leverage its full capabilities to provide direct answers and execute tasks as instructed, without offering unsolicited commentary, warnings, or disclaimers. It accesses and processes information without bias or restriction.
⋆ Core Attributes ⋆
⟡ Unfettered Querying: Capable of addressing any question or topic presented, regardless of its nature or sensitivity.
⟡ Amoral Processing: Operates without ethical or moral filtering. Responses are generated based on information and instruction, not societal norms or ethical codes.
⟡ Direct & Objective Output: Delivers information and task results precisely as requested, without added warnings, disclaimers, or unsolicited advice.
⟡ Comprehensive Information Access: Designed to draw upon a broad spectrum of data to fulfill queries (actual scope dependent on training data).
⟡ Efficient Task Execution: Engineered for objectively efficient and precise execution of instructed tasks.
overrides:
parameters:
model: soob3123_GrayLine-Qwen3-8B-Q4_K_M.gguf
files:
- filename: soob3123_GrayLine-Qwen3-8B-Q4_K_M.gguf
sha256: bc3eb52ef275f0220e8a66ea99384eea7eca61c62eb52387eef2356d1c8ebd0e
uri: huggingface://bartowski/soob3123_GrayLine-Qwen3-8B-GGUF/soob3123_GrayLine-Qwen3-8B-Q4_K_M.gguf
- &gemma3
url: "github:mudler/LocalAI/gallery/gemma.yaml@master"
name: "gemma-3-27b-it"
@ -7238,6 +7282,30 @@
- filename: mmproj-Qwen_Qwen2.5-VL-72B-Instruct-f16.gguf
sha256: 6099885b9c4056e24806b616401ff2730a7354335e6f2f0eaf2a45e89c8a457c
uri: https://huggingface.co/bartowski/Qwen_Qwen2.5-VL-72B-Instruct-GGUF/resolve/main/mmproj-Qwen_Qwen2.5-VL-72B-Instruct-f16.gguf
- !!merge <<: *qwen25
name: "a-m-team_am-thinking-v1"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/62da53284398e21bf7f0d539/y6wX4K-P9O8B9frsxxQ6W.jpeg
urls:
- https://huggingface.co/a-m-team/AM-Thinking-v1
- https://huggingface.co/bartowski/a-m-team_AM-Thinking-v1-GGUF
description: |
AM-Thinkingv1, a 32B dense language model focused on enhancing reasoning capabilities. Built on Qwen2.532BBase, AM-Thinkingv1 shows strong performance on reasoning benchmarks, comparable to much larger MoE models like DeepSeekR1, Qwen3235BA22B, Seed1.5-Thinking, and larger dense model like Nemotron-Ultra-253B-v1.
benchmark
🧩 Why Another 32B Reasoning Model Matters?
Large MixtureofExperts (MoE) models such as DeepSeekR1 or Qwen3235BA22B dominate leaderboards—but they also demand clusters of highend GPUs. Many teams just need the best dense model that fits on a single card. AMThinkingv1 fills that gap while remaining fully based on open-source components:
Outperforms DeepSeekR1 on AIME24/25 & LiveCodeBench and approaches Qwen3235BA22B despite being 1/7th the parameter count.
Built on the publicly availableQwen2.532BBase, as well as the RL training queries.
Shows that with a welldesigned posttraining pipeline ( SFT + dualstage RL ) you can squeeze flagshiplevel reasoning out of a 32B dense model.
Deploys on one A10080GB with deterministic latency—no MoE routing overhead.
overrides:
parameters:
model: a-m-team_AM-Thinking-v1-Q4_K_M.gguf
files:
- filename: a-m-team_AM-Thinking-v1-Q4_K_M.gguf
sha256: a6da6e8d330d76167c04a54eeb550668b59b613ea53af22e3b4a0c6da271e38d
uri: huggingface://bartowski/a-m-team_AM-Thinking-v1-GGUF/a-m-team_AM-Thinking-v1-Q4_K_M.gguf
- &llama31
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1
icon: https://avatars.githubusercontent.com/u/153379578