+
Skip to content
View jpWang's full-sized avatar
  • South China University of Technology
  • Guangzhou, China

Organizations

@SCUT-DLVCLab

Block or report jpWang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp/pp.

Python 110 8 Updated Jul 9, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,353 375 Updated Jul 11, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,995 1,579 Updated Jul 12, 2025

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

Python 134 6 Updated May 16, 2025

A curated list of awesome Multimodal studies.

224 19 Updated Jun 27, 2025

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 375 28 Updated Jul 12, 2025

Align Anything: Training All-modality Model with Feedback

Jupyter Notebook 4,230 500 Updated May 28, 2025

Audio Large Language Models

Python 607 34 Updated Jul 5, 2025

AudioBench: A Universal Benchmark for Audio Large Language Models

Python 234 9 Updated Jun 17, 2025

a toolkit on knowledge distillation for large language models

Python 109 5 Updated Jul 9, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,636 748 Updated Jul 11, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 54,139 6,628 Updated Jul 11, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 8,626 743 Updated Jul 12, 2025

Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 6,481 584 Updated Oct 24, 2024
Python 271 33 Updated Apr 11, 2025

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons

Python 1,154 166 Updated Mar 13, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,232 732 Updated May 27, 2025

AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models

Python 197 23 Updated May 29, 2025

Spark-TTS Inference Code

Python 10,016 1,056 Updated Apr 9, 2025

SOTA Open Source TTS

Python 22,314 1,827 Updated Jul 2, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 17,444 1,446 Updated Jul 6, 2025

A generative speech model for daily dialogue.

Python 37,119 4,020 Updated Jul 6, 2025

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 32,947 3,476 Updated Apr 19, 2025

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 48,678 5,355 Updated Jul 11, 2025

Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs

Python 977 154 Updated Jul 1, 2025

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

777 74 Updated Jul 8, 2025

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,966 241 Updated Jun 6, 2024
Python 4,408 358 Updated Jun 12, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,363 285 Updated Nov 5, 2024
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载