-
National Taiwan University
- Taipei, Taiwan
-
03:25
(UTC +08:00) - xjchen.tech
- @xjchen_ntu
- in/jun-ntu
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Fully open reproduction of DeepSeek-R1
Implementation of the paper "Improved DeepFake Detection Using Whisper Features"
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
zero-shot voice conversion & singing voice conversion, with real-time support
A category wise collection of 200+ LLM survey papers.
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
A TTS model capable of generating ultra-realistic dialogue in one pass.
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
[INTERSPEECH 2024] Official pytorch code for the paper "Disentangled Representation Learning for Environment-agnostic Speaker Recognition"
[INTERSPEECH 2025] Official code for "SEED: Speaker Embedding Enhancement Diffusion Model"
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Implementation for paper "Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Model"
Mini-Omni-Reasoner: a real-time speech reasoning framework that interleaves silent reasoning tokens with spoken response tokens (“thinking-in-speaking”), exploiting the LLM–audio throughput gap to …
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".