chore(model gallery): add nvidia_llama-3.1-nemotron-nano-4b-v1.1 (#5427)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto 2025-05-22 11:33:33 +02:00 committed by GitHub
parent 38c5d16b57
commit c587ac0aef
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -9722,6 +9722,33 @@
- filename: Llama-3.3-MagicalGirl-2.5.i1-Q4_K_M.gguf
sha256: 25db6d4ae5649e6d2084036d8f05ec1aca459126e2d4734d6c18f1e16147a4d3
uri: huggingface://mradermacher/Llama-3.3-MagicalGirl-2.5-i1-GGUF/Llama-3.3-MagicalGirl-2.5.i1-Q4_K_M.gguf
- !!merge <<: *llama31
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/1613114437487-60262a8e0703121c822a80b6.png
name: "nvidia_llama-3.1-nemotron-nano-4b-v1.1"
urls:
- https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1
- https://huggingface.co/bartowski/nvidia_Llama-3.1-Nemotron-Nano-4B-v1.1-GGUF
description: |
Llama-3.1-Nemotron-Nano-4B-v1.1 is a large language model (LLM) which is a derivative of nvidia/Llama-3.1-Minitron-4B-Width-Base, which is created from Llama 3.1 8B using our LLM compression technique and offers improvements in model accuracy and efficiency. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.
Llama-3.1-Nemotron-Nano-4B-v1.1 is a model which offers a great tradeoff between model accuracy and efficiency. The model fits on a single RTX GPU and can be used locally. The model supports a context length of 128K.
This model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, and Tool Calling as well as multiple reinforcement learning (RL) stages using Reward-aware Preference Optimization (RPO) algorithms for both chat and instruction-following. The final model checkpoint is obtained after merging the final SFT and RPO checkpoints
This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
Llama-3.3-Nemotron-Ultra-253B-v1
Llama-3.3-Nemotron-Super-49B-v1
Llama-3.1-Nemotron-Nano-8B-v1
This model is ready for commercial use.
overrides:
parameters:
model: nvidia_Llama-3.1-Nemotron-Nano-4B-v1.1-Q4_K_M.gguf
files:
- filename: nvidia_Llama-3.1-Nemotron-Nano-4B-v1.1-Q4_K_M.gguf
sha256: 530f0e0ade58d22d4b24d9378cf8a87161d22f33cae8f2f65876f3a1555819e6
uri: huggingface://bartowski/nvidia_Llama-3.1-Nemotron-Nano-4B-v1.1-GGUF/nvidia_Llama-3.1-Nemotron-Nano-4B-v1.1-Q4_K_M.gguf
- &deepseek
url: "github:mudler/LocalAI/gallery/deepseek.yaml@master" ## Deepseek
name: "deepseek-coder-v2-lite-instruct"