Releases · NVIDIA/NeMo

This release addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com

Prerelease: NVIDIA Neural Modules 2.4.0rc2 (2025-07-09)

Prerelease: NVIDIA Neural Modules 2.4.0rc1 (2025-07-02)

Prerelease: NVIDIA Neural Modules 2.4.0rc0 (2025-06-27)

@chtruong814

Highlights

Collections
- LLM
  - Llama 4: Fixed an accuracy issue caused by MoE probability normalization. Improved pre-train and fine-tune performance.
Export & Deploy
- Updated vLLMExporter to use vLLM V1 to address a security vulnerability.
AutoModel
- Improved chat-template handling.
Fault Tolerance
- Local checkpointing: Fixed support for auto-inserted metric names for resuming from local checkpoints.

Detailed Changelogs:

Export

Changelog

Cherry-pick Update vLLMExporter to use vLLM V1 (#13498) into r2.3.0 by @chtruong814 :: PR: #13631

Uncategorized:

Changelog

Bump to 2.3.1 by @chtruong814 :: PR: #13507
Cherry pick Use explicitly cached canary-1b-flash in CI tests (13237) into r2.3.0 by @ko3n1g :: PR: #13508
Cherry pick [automodel] bump liger-kernel to 0.5.8 + fallback (13260) into r2.3.0 by @ko3n1g :: PR: #13308
Cherry-pick Add recipe and ci scripts for qwen2vl to r2.3.0 by @romanbrickie :: PR: #13336
Cherry pick Fix skipme handling (13244) into r2.3.0 by @ko3n1g :: PR: #13376
Cherry pick Allow fp8 param gather when using FSDP (13267) into r2.3.0 by @ko3n1g :: PR: #13383
Cherry pick Handle boolean args for performance scripts and log received config (13291) into r2.3.0 by @ko3n1g :: PR: #13416
Cherry pick new perf configs (13110) into r2.3.0 by @ko3n1g :: PR: #13431
Cherry pick Adding additional unit tests for the deploy module (13411) into r2.3.0 by @ko3n1g :: PR: #13449
Cherry pick Adding more export tests (13410) into r2.3.0 by @ko3n1g :: PR: #13450
Cherry pick [automodel] add FirstRankPerNode (13373) into r2.3.0 by @ko3n1g :: PR: #13559
Cherry pick [automodel] deprecate global_batch_size dataset argument (13137) into r2.3.0 by @ko3n1g :: PR: #13560
Cherry-pick [automodel] fallback FP8 + LCE -> FP8 + CE (#13349) into r2.3.0 by @chtruong814 :: PR: #13561
Cherry pick [automodel] add find_unused_parameters=True for DDP (13366) into r2.3.0 by @ko3n1g :: PR: #13601
Cherry pick Add CI test for local checkpointing (#13012) into r2.3.0 by @ananthsub :: PR: #13472
Cherry pick [automodel] fix --mbs/gbs dtype and chat-template (13598) into r2.3.0 by @akoumpa :: PR: #13613
Cherry-pick Update t5.py (#13082) to r2.3.0 and bump mcore to f98b1a0 by @chtruong814 :: PR: #13642
[Automodel] Fix CP device_mesh issue, use PTL distsampler (#13473) by @akoumpa :: PR: #13636
[Llama4] Fix the recipe bug - cherrypick #13649 by @gdengk :: PR: #13650
build: Pin transformers (#13675) by @ko3n1g :: PR: #13692

@KunalDhawan

Highlights

Export & Deploy
- NeMo 2.0 export path for NIM
- ONNX and TensorRT Export for NIM Embedding Container
- In-framework deployment for HF Models
- TRT-LLM deployment for HF Models in NeMo Framework
Evaluation
- Integrate nvidia-lm-eval to NeMo FW for evaluations with OpenAI API compatible in-framework deployment
AutoModel
- VLM AutoModelForImageForTextToText
- FP8 for AutoModel
- Support CP with FSDP2
- Support TP with FSDP2
- Performance Optimization
  - add support for cut cross entropy & liger kernel
  - Gradient Checkpointing
Fault Tolerance
- Integrate NVRx v0.3 Local checkpointing
Collections
- LLM
  - Llama4
  - Llama Nemotron Ultra
  - Llama Nemotron Super
  - Llama Nemotron Nano
  - Nemotron-h/5
  - DeepSeek V3 Pretraining
  - Evo2
  - Qwen 2.5
  - LoRA for Qwen3-32B and Qwen3-30B-A3B
- MultiModal
  - FLUX
  - Gemma 3
  - Qwen2-VL
- ASR
  - NeMo Run support for ASR training
  - N-Gram LM on GPU for AED
  - N-Gram LM on GPU + Transducer greedy decoding (RNN-T, TDT)
  - Timestamps support for AED timestamp supported models
  - Migrate SpeechLM to NeMo 2.0
  - Canary-1.1
  - Replace ClassificationModels class with LabelModels
Performance
- Functional MXFP8 support for (G)B200
- Current scaling recipe with TP communication overlap and FP8 param gathers
- Custom FSDP support that fully utilizes GB200 NVL72

Detailed Changelogs:

ASR

Changelog

Added model config params for Canary-1B-Flash, Canary-180M-Flash models by @KunalDhawan :: PR: #12588
Canary tutorial by @ankitapasad :: PR: #12613
Canary tutorial fix timestamp by @ankitapasad :: PR: #12677
revert config by @nithinraok :: PR: #12689
canary longform inference script with timestamps option by @krishnacpuvvada :: PR: #12653
Fix default timestamps value for Hybrid ASR models by @artbataev :: PR: #12681
Fix k2 installation with PyTorch 2.6.0 by @artbataev :: PR: #12686
Improve time and RTFx report for ASR by @artbataev :: PR: #12680
Modify train args by @ankitapasad :: PR: #12700
Fix asr doc warnings by @nithinraok :: PR: #12720
Rename FastNGramLM -> NGramGPULanguageModel by @artbataev :: PR: #12755
transcribe fix for new hypotheses by @nune-tadevosyan :: PR: #12801
Fix timestamps when cuda graphs enabled by @monica-sekoyan :: PR: #12808
update streaming conformer by @stevehuang52 :: PR: #12846
AED Decoding with N-Gram LM by @artbataev :: PR: #12730
update notebook by @nithinraok :: PR: #13088
bugfix ASR_Context_Biasing.ipynb by @lilithgrigoryan :: PR: #13109
Change branch for installation from main to r2.3.0 by @ankitapasad :: PR: #13266

TTS

Changelog

Add Magpie-TTS and Updates NeMo Audio Codecs by @blisc :: PR: #12606
fix bug from prior commit (#13264) by @blisc :: PR: #13328

NLP / NMT

Changelog

Remove old peft docs by @cuichenx :: PR: #12675
Add code coverage for llm gpt models conversion tests by @suiyoubi :: PR: #12665
Make BERT TransformerBlockWithPostLNSupport accept more inputs from Mcore by @suiyoubi :: PR: #12685
remove gifs from documentation by @dimapihtar :: PR: #12732
Rename FastNGramLM -> NGramGPULanguageModel by @artbataev :: PR: #12755
fix NeMo documentation by @dimapihtar :: PR: #12754
GPT Model/Data/Recipe Unit Test by @suiyoubi :: PR: #12757
ci: Exclude nlp, mm, vision collections by @ko3n1g :: PR: #12816
Add vocab size as attr to GPT and T5 Configs, use file name based logger in llm.gpt.data by @hemildesai :: PR: #12862
Fix transformer layer api with megatron cbc89b3 by @yaoyu-33 :: PR: #12885

Text Normalization / Inverse Text Normalization

Changelog

Rename FastNGramLM -> NGramGPULanguageModel by @artbataev :: PR: #12755

Export

Changelog

GHA Conversion Test and Importer/Exporter Refactor by @suiyoubi :: PR: #12597
Fix Llama Embedding Model Exporting keys by @suiyoubi :: PR: #12691
build: Add trtllm by @ko3n1g :: PR: #12672
Fix trt-llm install by @chtruong814 :: PR: #12827
Update LLaVA's next HF exporter to load ViT checkpoint from YAML by @eagle705 :: PR: #12841
Support huggingface export to tensorrtllm by @pthombre :: PR: #12889
Adds a built stage for the trt-llm wheel to reduce the overall test image size by @chtruong814 :: PR: #12883

Uncategorized:

Changelog

Update changelog-build.yml by @ko3n1g :: PR: #12584
Update changelog for r2.2.0 by @github-actions[bot] :: PR: #12585
Add comments for requirements by @thomasdhc :: PR: #12603
[automodel] FSDP2Strategy: move to device if using a single-device by @akoumpa :: PR: #12593
build: Remove numba pin by @ko3n1g :: PR: #12604
docs: Update installation guides by @ko3n1g :: PR: #12596
Change Llama Scaling Factor type to Float by @suiyoubi :: PR: #12616
ci: Test multiple python versions by @ko3n1g :: PR: #12619
ci: Disable reformat by @ko3n1g :: PR: #12620
Updating ModelOpt to 0.25.0 by @janekl :: PR: #12633
[automodel] add additional hf_dataset tests by @akoumpa :: PR: #12646
[automodel] add jit_transform tests by @akoumpa :: PR: #12645
[automodel] init eos_token_id inside data module by @yuanzhedong :: PR: #12610
[automodel] grad ckpt by @akoumpa :: PR: #12644
bugfix(llm/LLaMa) - dropout_position can never be equal to extended string by @soluwalana :: PR: #12649
Fix inference pipeline quality issue by @Victor49152 :: PR: #12639
[automodel] switch to direct=True to propage return codes in nemorun by @akoumpa :: PR: #12651
add Auto Conf support for bert, t5, qwen, starcoder models by @dimapihtar :: PR: #12601
ci: Upload coverage by @ko3n1g :: PR: #12668
ci: Re-enable changed-files action by @ko3n1g :: PR: #12683
build: Pin sox by @ko3n1g :: PR: #12701
add neva quantization by @linnanwang :: PR: #12698
Clip coverage by @abhinavg4 :: PR: #12696
GHA CI test: Remove unnecessary directive by @pablo-garay :: PR: #12714
minor perf fixes by @malay-nagda :: PR: #12656
Add DeepSeek V2 Lite into llm init.py by @suiyoubi :: PR: #12664
Add Llama-Nemotron Nano and 70B models by @suiyoubi :: PR: #12712
Save batch norm running stats in PEFT checkpoints by @cuichenx :: PR: #12666
Fix document Readme under nemo to add more information by @yaoyu-33 :: PR: #12699
Fix ub_overlap_ag by @cuichenx :: PR: #12721
Toggle fast tokenizer if error occurs by @cuichenx :: PR: #12722
Update README.md for blackwell and AutoModel by @snowmanwwg :: PR: #12612
Raise error on import_ckpt with overwrite=False plus README for checkpoint_converters by @janekl :: PR: #12693
[automodel] fix validation_step by @soluwalana :: PR: #12659
[automodel] vlm tests by @akoumpa :: PR: #12716
Auto Configurator code coverage by @dimapihtar :: PR: #12694
[automodel] fix automodle benchmark script by @yuanzhedong :: PR: #12605
Remove unnecessary directives by @pablo-garay :: PR: #12743
Add recipe tests for coverage by @cuichenx :: PR: #12737
Add Qwen2.5 in NeMo2 by @suiyoubi :: PR: #12731
add fallback_module to safe_import_from by @akoumpa :: PR: #12726
Update quantization scripts & relax modelopt requirement specifier by @janekl :: PR: #12709
Import guard fasttext by @thomasdhc :: PR: #12758
[automodel] chunked cross entropy by @akoumpa :: PR: #12752
Add fsdp automodel test by @BoxiangW :: PR: #12718
[automodel] if peft move only adapters to cpu by @akoumpa :: PR: #12735
[automodel] update hf mockdataset by @akoumpa :: PR: #12643
[automodel] remove unused cell in multinode notebook by @yuanzhedong :: PR: #12624
Yash/llava next coverage by @yashaswikarnati :: PR: #12745
Tidy code: remove unneeded statements/lines by @pablo-garay :: PR: #12771
Pass tensor instead of raw number in _mock_loss_function in PTQ by @janekl :: PR: #12769
ci: Run on nightly schedule by @ko3n1g :: PR: #12775
Add logs for checkpoint saving start and finalization by @lepan-google :: PR: #12697
Alit/test coverage by @JRD971000 :: PR: #12762
Fix loss mask with packed sequence by @ashors1 :: PR: #12642
Add pruning recipe by @kevalmorabia97 :: PR: #12602
Update qwen2-v1 to use NeMo quick_gelu by @thomasdhc :: PR: #12787
[doc] Fixes for audio doc warnings by @anteju :: PR: #12736
ci: Measure multiprocessing by @ko3n1g :: PR: #12778
ci: Fix flaky LLM tests by @ko3n1g :: PR: #12807
Add BERT/Qwen2.5 Unit test and Refactor all GHA Conversion Tests by @suiyoubi :: PR: #12785
Fix TransformerBlock cuda_graphs compatibility with MCore by @buptzyb :: PR: #12779
ci: Remove --branch by @ko3n1g :: PR: #12809
ci: Move scripts fully down to files by @ko3n1g :: PR: #12802
add init.py to make this a package by @akoumpa :: PR: #12814
Update changelog for r2.2.1 by @github-actions[bot] :: PR: #12818
add finetune support for Auto Configurator by @dimapihtar :: PR: #12770
[automodel] add cpu:gloo to backend by @akoumpa :: PR: #12832
add missing call to _apply_liger_kernel_to_instance by @akoumpa :: PR: #12806
Prune docker images in GHA older than 8hrs by @chtruong814 :: PR: #12838
[audio] Adding tests for predictive models by @anteju :: PR: #12823
Update resiliency example notebook readme and add links to the brev launchable by @ShriyaRishab :: PR: #12843
[automodel] qlora peft by @yzhang123 :: PR: #12817
ci: Increase prune time by @ko3n1g :: PR: #12860
Update base container in Dockerfile.speech by @artbataev :: PR: #12859
Fix qwen2.5 1.5b configuration inheritance bug by @Aprilistic :: PR: #12852
Update modelopt upperbound to 0.27 by @thomasdhc :: PR: #12788
Non-bloc...

Prerelease: NVIDIA Neural Modules 2.3.0rc4 (2025-04-21)

Prerelease: NVIDIA Neural Modules 2.3.0rc3 (2025-04-15)

Prerelease: NVIDIA Neural Modules 2.3.0rc2 (2025-04-07)

@chtruong814

Highlights

Training
- Fix MoE based models training instability.
- Fix bug in Llama exporter for Llama 3.2 1B and 3B.
- Fix bug in LoRA linear_fc1adapter when different TP is used during saving and loading the adapter checkpoint.

Detailed Changelogs:

Uncategorized:

Changelog

Re-add reverted commits after 2.2.0 and set next version to be 2.2.1 by @chtruong814 :: PR: #12587
Cherry pick Fix exporter for llama models with shared embed and output layers (12545) into r2.2.0 by @ko3n1g :: PR: #12608
Cherry pick Fix TP for LoRA adapter on linear_fc1 (12519) into r2.2.0 by @ko3n1g :: PR: #12607
Bump mcore to use 0.11.1 by @chtruong814 :: PR: #12634

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Highlights

Detailed Changelogs:

Export

Uncategorized:

Contributors

Uh oh!

Highlights

Detailed Changelogs:

ASR

TTS

NLP / NMT

Text Normalization / Inverse Text Normalization

Export

Uncategorized:

Contributors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Highlights

Detailed Changelogs:

Uncategorized:

Contributors

Uh oh!

Releases: NVIDIA/NeMo

NVIDIA Neural Modules 2.3.2

Uh oh!

NVIDIA Neural Modules 2.4.0rc2

Uh oh!

NVIDIA Neural Modules 2.4.0rc1

Uh oh!

NVIDIA Neural Modules 2.4.0rc0

Uh oh!

NVIDIA Neural Modules 2.3.1

Highlights

Detailed Changelogs:

Export

Uncategorized:

Contributors

Uh oh!

NVIDIA Neural Modules 2.3.0

Highlights

Detailed Changelogs:

ASR

TTS

NLP / NMT

Text Normalization / Inverse Text Normalization

Export

Uncategorized:

Contributors

Uh oh!

NVIDIA Neural Modules 2.3.0rc4

Uh oh!

NVIDIA Neural Modules 2.3.0rc3

Uh oh!

NVIDIA Neural Modules 2.3.0rc2

Uh oh!

NVIDIA Neural Modules 2.2.1

Highlights

Detailed Changelogs:

Uncategorized:

Contributors

Uh oh!