+
Skip to content
View forwiat's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report forwiat

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 759 72 Updated Sep 20, 2025

Long-form streaming TTS system for multi-speaker dialogue generation

Python 766 94 Updated Oct 10, 2025

[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"

Python 516 26 Updated Jul 1, 2025

Legacy-Mess Detector – assess the “legacy-mess level” of your code and output a beautiful report | 屎山代码检测器,评估代码的“屎山等级”并输出美观的报告

Go 5,341 255 Updated Oct 2, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 1,679 173 Updated Oct 9, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,415 246 Updated Oct 10, 2025

[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

Python 63 3 Updated Nov 1, 2024

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 15,692 1,170 Updated Oct 5, 2025

Comfyui custom node for FunAudioLLM include CosyVoice and SenseVoice

Python 88 11 Updated Nov 27, 2024

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,881 704 Updated May 31, 2024

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,554 133 Updated Oct 9, 2025

A must-read paper for speech separation based on neural networks

TypeScript 842 139 Updated Aug 11, 2025

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Python 16,470 4,941 Updated Aug 1, 2024

Scalable toolkit for efficient model reinforcement

Python 921 152 Updated Oct 10, 2025

An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"

Python 87 2 Updated Sep 28, 2025

Datawhale成员整理的面经,内容包括机器学习,CV,NLP,推荐,开发等,欢迎大家star

3,218 478 Updated Aug 27, 2025

PyTorch Implementation of StyleSinger(AAAI 2024): Style Transfer for Out-of-Domain Singing Voice Synthesis

Python 409 26 Updated Aug 15, 2025

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Python 94 7 Updated Sep 2, 2025

Efficient audio understanding with general audio captions

Jupyter Notebook 365 37 Updated Oct 9, 2025

Inworld TTS

Python 505 43 Updated Sep 19, 2025
C++ 303 26 Updated Oct 1, 2025

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

39 2 Updated Oct 28, 2024

Code for BLT research paper

Python 1,989 178 Updated May 22, 2025
428 9 Updated Aug 10, 2025

[ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Python 194 12 Updated Jun 26, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 9,891 1,000 Updated Sep 19, 2025

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Python 2 Updated Jun 30, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,142 81 Updated Sep 22, 2025

Text-audio foundation model from Boson AI

Python 7,420 537 Updated Sep 15, 2025

Repo for counting stars and contributing. Press F to pay respect to glorious developers.

274,522 21,045 Updated Aug 22, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载