+
Skip to content
View xjchenGit's full-sized avatar
🤔
Focusing
🤔
Focusing

Block or report xjchenGit

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Official Repository For VoxBlink2

Python 84 5 Updated Aug 13, 2024

Fully open reproduction of DeepSeek-R1

Python 25,561 2,396 Updated Sep 8, 2025

Implementation of the paper "Improved DeepFake Detection Using Whisper Features"

Python 105 10 Updated Apr 9, 2025
Jupyter Notebook 25 Updated Sep 11, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,693 144 Updated Oct 9, 2025

zero-shot voice conversion & singing voice conversion, with real-time support

Python 3,323 389 Updated Apr 20, 2025

SOTA Open Source TTS

Python 23,244 1,921 Updated Oct 20, 2025

A category wise collection of 200+ LLM survey papers.

179 27 Updated Apr 7, 2025

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions

Python 86 5 Updated Oct 11, 2024

[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Python 262 14 Updated Nov 22, 2024

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 532 32 Updated Oct 20, 2025

Spark-TTS Inference Code

Python 10,618 1,129 Updated Apr 9, 2025

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Python 90 5 Updated Nov 9, 2024

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 969 73 Updated Dec 23, 2024

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,644 1,610 Updated Jul 6, 2025

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python 666 53 Updated Oct 1, 2024

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 1,047 156 Updated Oct 13, 2025

An Audio Language model for Audio Tasks

Python 316 15 Updated Apr 19, 2024
Python 4,533 362 Updated Jun 12, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,300 306 Updated Jun 21, 2025

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Python 458 40 Updated Apr 24, 2024

[INTERSPEECH 2024] Official pytorch code for the paper "Disentangled Representation Learning for Environment-agnostic Speaker Recognition"

Python 16 3 Updated Jul 23, 2024

[INTERSPEECH 2025] Official code for "SEED: Speaker Embedding Enhancement Diffusion Model"

Python 50 2 Updated Sep 22, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,904 147 Updated Apr 21, 2025

Official repository for U-SAM (Interspeech 2025)

Python 21 3 Updated Jun 3, 2025

Implementation for paper "Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Model"

Python 94 14 Updated Dec 16, 2024

Mini-Omni-Reasoner: a real-time speech reasoning framework that interleaves silent reasoning tokens with spoken response tokens (“thinking-in-speaking”), exploiting the LLM–audio throughput gap to …

159 19 Updated Aug 26, 2025

This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".

175 10 Updated Oct 20, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载