Stars
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp/pp.
FlashInfer: Kernel Library for LLM Serving
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
A curated list of awesome Multimodal studies.
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
Align Anything: Training All-modality Model with Feedback
AudioBench: A Universal Benchmark for Audio Large Language Models
a toolkit on knowledge distillation for large language models
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models
A TTS model capable of generating ultra-realistic dialogue in one pass.
A generative speech model for daily dialogue.
Instant voice cloning by MIT and MyShell. Audio foundation model.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.