Releases: NVIDIA/NeMo
Releases · NVIDIA/NeMo
NVIDIA Neural Modules 2.3.2
This release addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com
NVIDIA Neural Modules 2.4.0rc2
Prerelease: NVIDIA Neural Modules 2.4.0rc2 (2025-07-09)
NVIDIA Neural Modules 2.4.0rc1
Prerelease: NVIDIA Neural Modules 2.4.0rc1 (2025-07-02)
NVIDIA Neural Modules 2.4.0rc0
Prerelease: NVIDIA Neural Modules 2.4.0rc0 (2025-06-27)
NVIDIA Neural Modules 2.3.1
Highlights
- Collections
- LLM
- Llama 4: Fixed an accuracy issue caused by MoE probability normalization. Improved pre-train and fine-tune performance.
- LLM
- Export & Deploy
- Updated vLLMExporter to use vLLM V1 to address a security vulnerability.
- AutoModel
- Improved chat-template handling.
- Fault Tolerance
- Local checkpointing: Fixed support for auto-inserted metric names for resuming from local checkpoints.
Detailed Changelogs:
Export
Changelog
- Cherry-pick
Update vLLMExporter to use vLLM V1
(#13498) intor2.3.0
by @chtruong814 :: PR: #13631
Uncategorized:
Changelog
- Bump to 2.3.1 by @chtruong814 :: PR: #13507
- Cherry pick
Use explicitly cached canary-1b-flash in CI tests (13237)
intor2.3.0
by @ko3n1g :: PR: #13508 - Cherry pick
[automodel] bump liger-kernel to 0.5.8 + fallback (13260)
intor2.3.0
by @ko3n1g :: PR: #13308 - Cherry-pick
Add recipe and ci scripts for qwen2vl
tor2.3.0
by @romanbrickie :: PR: #13336 - Cherry pick
Fix skipme handling (13244)
intor2.3.0
by @ko3n1g :: PR: #13376 - Cherry pick
Allow fp8 param gather when using FSDP (13267)
intor2.3.0
by @ko3n1g :: PR: #13383 - Cherry pick
Handle boolean args for performance scripts and log received config (13291)
intor2.3.0
by @ko3n1g :: PR: #13416 - Cherry pick
new perf configs (13110)
intor2.3.0
by @ko3n1g :: PR: #13431 - Cherry pick
Adding additional unit tests for the deploy module (13411)
intor2.3.0
by @ko3n1g :: PR: #13449 - Cherry pick
Adding more export tests (13410)
intor2.3.0
by @ko3n1g :: PR: #13450 - Cherry pick
[automodel] add FirstRankPerNode (13373)
intor2.3.0
by @ko3n1g :: PR: #13559 - Cherry pick
[automodel] deprecate global_batch_size dataset argument (13137)
intor2.3.0
by @ko3n1g :: PR: #13560 - Cherry-pick
[automodel] fallback FP8 + LCE -> FP8 + CE
(#13349) intor2.3.0
by @chtruong814 :: PR: #13561 - Cherry pick
[automodel] add find_unused_parameters=True for DDP (13366)
intor2.3.0
by @ko3n1g :: PR: #13601 - Cherry pick
Add CI test for local checkpointing (#13012)
intor2.3.0
by @ananthsub :: PR: #13472 - Cherry pick
[automodel] fix --mbs/gbs dtype and chat-template (13598)
intor2.3.0
by @akoumpa :: PR: #13613 - Cherry-pick
Update t5.py
(#13082) tor2.3.0
andbump mcore to f98b1a0
by @chtruong814 :: PR: #13642 - [Automodel] Fix CP device_mesh issue, use PTL distsampler (#13473) by @akoumpa :: PR: #13636
- [Llama4] Fix the recipe bug - cherrypick #13649 by @gdengk :: PR: #13650
- build: Pin transformers (#13675) by @ko3n1g :: PR: #13692
NVIDIA Neural Modules 2.3.0
Highlights
- Export & Deploy
- NeMo 2.0 export path for NIM
- ONNX and TensorRT Export for NIM Embedding Container
- In-framework deployment for HF Models
- TRT-LLM deployment for HF Models in NeMo Framework
- Evaluation
- Integrate nvidia-lm-eval to NeMo FW for evaluations with OpenAI API compatible in-framework deployment
- AutoModel
- VLM AutoModelForImageForTextToText
- FP8 for AutoModel
- Support CP with FSDP2
- Support TP with FSDP2
- Performance Optimization
- add support for cut cross entropy & liger kernel
- Gradient Checkpointing
- Fault Tolerance
- Integrate NVRx v0.3 Local checkpointing
- Collections
- LLM
- Llama4
- Llama Nemotron Ultra
- Llama Nemotron Super
- Llama Nemotron Nano
- Nemotron-h/5
- DeepSeek V3 Pretraining
- Evo2
- Qwen 2.5
- LoRA for Qwen3-32B and Qwen3-30B-A3B
- MultiModal
- FLUX
- Gemma 3
- Qwen2-VL
- ASR
- NeMo Run support for ASR training
- N-Gram LM on GPU for AED
- N-Gram LM on GPU + Transducer greedy decoding (RNN-T, TDT)
- Timestamps support for AED timestamp supported models
- Migrate SpeechLM to NeMo 2.0
- Canary-1.1
- Replace ClassificationModels class with LabelModels
- LLM
- Performance
- Functional MXFP8 support for (G)B200
- Current scaling recipe with TP communication overlap and FP8 param gathers
- Custom FSDP support that fully utilizes GB200 NVL72
Detailed Changelogs:
ASR
Changelog
- Added model config params for Canary-1B-Flash, Canary-180M-Flash models by @KunalDhawan :: PR: #12588
- Canary tutorial by @ankitapasad :: PR: #12613
- Canary tutorial fix timestamp by @ankitapasad :: PR: #12677
- revert config by @nithinraok :: PR: #12689
- canary longform inference script with timestamps option by @krishnacpuvvada :: PR: #12653
- Fix default timestamps value for Hybrid ASR models by @artbataev :: PR: #12681
- Fix k2 installation with PyTorch 2.6.0 by @artbataev :: PR: #12686
- Improve time and RTFx report for ASR by @artbataev :: PR: #12680
- Modify train args by @ankitapasad :: PR: #12700
- Fix asr doc warnings by @nithinraok :: PR: #12720
- Rename
FastNGramLM
->NGramGPULanguageModel
by @artbataev :: PR: #12755 - transcribe fix for new hypotheses by @nune-tadevosyan :: PR: #12801
- Fix timestamps when cuda graphs enabled by @monica-sekoyan :: PR: #12808
- update streaming conformer by @stevehuang52 :: PR: #12846
- AED Decoding with N-Gram LM by @artbataev :: PR: #12730
- update notebook by @nithinraok :: PR: #13088
- bugfix ASR_Context_Biasing.ipynb by @lilithgrigoryan :: PR: #13109
- Change branch for installation from main to r2.3.0 by @ankitapasad :: PR: #13266
TTS
Changelog
NLP / NMT
Changelog
- Remove old peft docs by @cuichenx :: PR: #12675
- Add code coverage for llm gpt models conversion tests by @suiyoubi :: PR: #12665
- Make BERT TransformerBlockWithPostLNSupport accept more inputs from Mcore by @suiyoubi :: PR: #12685
- remove gifs from documentation by @dimapihtar :: PR: #12732
- Rename
FastNGramLM
->NGramGPULanguageModel
by @artbataev :: PR: #12755 - fix NeMo documentation by @dimapihtar :: PR: #12754
- GPT Model/Data/Recipe Unit Test by @suiyoubi :: PR: #12757
- ci: Exclude nlp, mm, vision collections by @ko3n1g :: PR: #12816
- Add vocab size as attr to GPT and T5 Configs, use file name based logger in llm.gpt.data by @hemildesai :: PR: #12862
- Fix transformer layer api with megatron cbc89b3 by @yaoyu-33 :: PR: #12885
Text Normalization / Inverse Text Normalization
Changelog
- Rename
FastNGramLM
->NGramGPULanguageModel
by @artbataev :: PR: #12755
Export
Changelog
- GHA Conversion Test and Importer/Exporter Refactor by @suiyoubi :: PR: #12597
- Fix Llama Embedding Model Exporting keys by @suiyoubi :: PR: #12691
- build: Add trtllm by @ko3n1g :: PR: #12672
- Fix trt-llm install by @chtruong814 :: PR: #12827
- Update LLaVA's next HF exporter to load ViT checkpoint from YAML by @eagle705 :: PR: #12841
- Support huggingface export to tensorrtllm by @pthombre :: PR: #12889
- Adds a built stage for the trt-llm wheel to reduce the overall test image size by @chtruong814 :: PR: #12883
Uncategorized:
Changelog
- Update changelog-build.yml by @ko3n1g :: PR: #12584
- Update changelog for
r2.2.0
by @github-actions[bot] :: PR: #12585 - Add comments for requirements by @thomasdhc :: PR: #12603
- [automodel] FSDP2Strategy: move to device if using a single-device by @akoumpa :: PR: #12593
- build: Remove numba pin by @ko3n1g :: PR: #12604
- docs: Update installation guides by @ko3n1g :: PR: #12596
- Change Llama Scaling Factor type to Float by @suiyoubi :: PR: #12616
- ci: Test multiple python versions by @ko3n1g :: PR: #12619
- ci: Disable reformat by @ko3n1g :: PR: #12620
- Updating ModelOpt to 0.25.0 by @janekl :: PR: #12633
- [automodel] add additional hf_dataset tests by @akoumpa :: PR: #12646
- [automodel] add jit_transform tests by @akoumpa :: PR: #12645
- [automodel] init eos_token_id inside data module by @yuanzhedong :: PR: #12610
- [automodel] grad ckpt by @akoumpa :: PR: #12644
- bugfix(llm/LLaMa) - dropout_position can never be equal to extended string by @soluwalana :: PR: #12649
- Fix inference pipeline quality issue by @Victor49152 :: PR: #12639
- [automodel] switch to direct=True to propage return codes in nemorun by @akoumpa :: PR: #12651
- add Auto Conf support for bert, t5, qwen, starcoder models by @dimapihtar :: PR: #12601
- ci: Upload coverage by @ko3n1g :: PR: #12668
- ci: Re-enable changed-files action by @ko3n1g :: PR: #12683
- build: Pin sox by @ko3n1g :: PR: #12701
- add neva quantization by @linnanwang :: PR: #12698
- Clip coverage by @abhinavg4 :: PR: #12696
- GHA CI test: Remove unnecessary directive by @pablo-garay :: PR: #12714
- minor perf fixes by @malay-nagda :: PR: #12656
- Add DeepSeek V2 Lite into llm init.py by @suiyoubi :: PR: #12664
- Add Llama-Nemotron Nano and 70B models by @suiyoubi :: PR: #12712
- Save batch norm running stats in PEFT checkpoints by @cuichenx :: PR: #12666
- Fix document Readme under nemo to add more information by @yaoyu-33 :: PR: #12699
- Fix ub_overlap_ag by @cuichenx :: PR: #12721
- Toggle fast tokenizer if error occurs by @cuichenx :: PR: #12722
- Update README.md for blackwell and AutoModel by @snowmanwwg :: PR: #12612
- Raise error on import_ckpt with overwrite=False plus README for checkpoint_converters by @janekl :: PR: #12693
- [automodel] fix validation_step by @soluwalana :: PR: #12659
- [automodel] vlm tests by @akoumpa :: PR: #12716
- Auto Configurator code coverage by @dimapihtar :: PR: #12694
- [automodel] fix automodle benchmark script by @yuanzhedong :: PR: #12605
- Remove unnecessary directives by @pablo-garay :: PR: #12743
- Add recipe tests for coverage by @cuichenx :: PR: #12737
- Add Qwen2.5 in NeMo2 by @suiyoubi :: PR: #12731
- add fallback_module to safe_import_from by @akoumpa :: PR: #12726
- Update quantization scripts & relax modelopt requirement specifier by @janekl :: PR: #12709
- Import guard fasttext by @thomasdhc :: PR: #12758
- [automodel] chunked cross entropy by @akoumpa :: PR: #12752
- Add fsdp automodel test by @BoxiangW :: PR: #12718
- [automodel] if peft move only adapters to cpu by @akoumpa :: PR: #12735
- [automodel] update hf mockdataset by @akoumpa :: PR: #12643
- [automodel] remove unused cell in multinode notebook by @yuanzhedong :: PR: #12624
- Yash/llava next coverage by @yashaswikarnati :: PR: #12745
- Tidy code: remove unneeded statements/lines by @pablo-garay :: PR: #12771
- Pass tensor instead of raw number in _mock_loss_function in PTQ by @janekl :: PR: #12769
- ci: Run on nightly schedule by @ko3n1g :: PR: #12775
- Add logs for checkpoint saving start and finalization by @lepan-google :: PR: #12697
- Alit/test coverage by @JRD971000 :: PR: #12762
- Fix loss mask with packed sequence by @ashors1 :: PR: #12642
- Add pruning recipe by @kevalmorabia97 :: PR: #12602
- Update qwen2-v1 to use NeMo quick_gelu by @thomasdhc :: PR: #12787
- [doc] Fixes for audio doc warnings by @anteju :: PR: #12736
- ci: Measure multiprocessing by @ko3n1g :: PR: #12778
- ci: Fix flaky LLM tests by @ko3n1g :: PR: #12807
- Add BERT/Qwen2.5 Unit test and Refactor all GHA Conversion Tests by @suiyoubi :: PR: #12785
- Fix TransformerBlock cuda_graphs compatibility with MCore by @buptzyb :: PR: #12779
- ci: Remove
--branch
by @ko3n1g :: PR: #12809 - ci: Move scripts fully down to files by @ko3n1g :: PR: #12802
- add init.py to make this a package by @akoumpa :: PR: #12814
- Update changelog for
r2.2.1
by @github-actions[bot] :: PR: #12818 - add finetune support for Auto Configurator by @dimapihtar :: PR: #12770
- [automodel] add cpu:gloo to backend by @akoumpa :: PR: #12832
- add missing call to _apply_liger_kernel_to_instance by @akoumpa :: PR: #12806
- Prune docker images in GHA older than 8hrs by @chtruong814 :: PR: #12838
- [audio] Adding tests for predictive models by @anteju :: PR: #12823
- Update resiliency example notebook readme and add links to the brev launchable by @ShriyaRishab :: PR: #12843
- [automodel] qlora peft by @yzhang123 :: PR: #12817
- ci: Increase prune time by @ko3n1g :: PR: #12860
- Update base container in
Dockerfile.speech
by @artbataev :: PR: #12859 - Fix qwen2.5 1.5b configuration inheritance bug by @Aprilistic :: PR: #12852
- Update modelopt upperbound to 0.27 by @thomasdhc :: PR: #12788
- Non-bloc...
NVIDIA Neural Modules 2.3.0rc4
Prerelease: NVIDIA Neural Modules 2.3.0rc4 (2025-04-21)
NVIDIA Neural Modules 2.3.0rc3
Prerelease: NVIDIA Neural Modules 2.3.0rc3 (2025-04-15)
NVIDIA Neural Modules 2.3.0rc2
Prerelease: NVIDIA Neural Modules 2.3.0rc2 (2025-04-07)
NVIDIA Neural Modules 2.2.1
Highlights
- Training
- Fix MoE based models training instability.
- Fix bug in Llama exporter for Llama 3.2 1B and 3B.
- Fix bug in LoRA linear_fc1adapter when different TP is used during saving and loading the adapter checkpoint.
Detailed Changelogs:
Uncategorized:
Changelog
- Re-add reverted commits after 2.2.0 and set next version to be 2.2.1 by @chtruong814 :: PR: #12587
- Cherry pick
Fix exporter for llama models with shared embed and output layers (12545)
intor2.2.0
by @ko3n1g :: PR: #12608 - Cherry pick
Fix TP for LoRA adapter on
linear_fc1(12519)
intor2.2.0
by @ko3n1g :: PR: #12607 - Bump mcore to use 0.11.1 by @chtruong814 :: PR: #12634