xjchenGit

🤔

Focusing

Xuanjun (Victor) Chen xjchenGit

🤔

Focusing

Ph.D. Student @ EECS, NTU.

55 followers · 316 following

National Taiwan University
Taipei, Taiwan
03:25 (UTC +08:00)
xjchen.tech
@xjchen_ntu
in/jun-ntu

Achievements

Lists (1)

Sort

🚀 My stack

Starred repositories

VoxBlink2 / ScriptsForVoxBlink2

Official Repository For VoxBlink2

Python 84 5 Updated Aug 13, 2024

SamsungSAILMontreal / TinyRecursiveModels

Python 4,946 643 Updated Oct 8, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,561 2,396 Updated Sep 8, 2025

piotrkawa / deepfake-whisper-features

Implementation of the paper "Improved DeepFake Detection Using Whisper Features"

Python 105 10 Updated Apr 9, 2025

cyaaronk / audio_deepfake_eval

Jupyter Notebook 25 Updated Sep 11, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,693 144 Updated Oct 9, 2025

Plachtaa / seed-vc

zero-shot voice conversion & singing voice conversion, with real-time support

Python 3,323 389 Updated Apr 20, 2025

fishaudio / fish-speech

SOTA Open Source TTS

Python 23,244 1,921 Updated Oct 20, 2025

KalyanKS-NLP / LLM-Survey-Papers-Collection

A category wise collection of 200+ LLM survey papers.

179 27 Updated Apr 7, 2025

line / promptttspp

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions

Python 86 5 Updated Oct 11, 2024

jishengpeng / ControlSpeech

[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Python 262 14 Updated Nov 22, 2024

liutaocode / TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 532 32 Updated Oct 20, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 10,618 1,129 Updated Apr 9, 2025

thuhcsi / VoxInstruct

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Python 90 5 Updated Nov 9, 2024

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 969 73 Updated Dec 23, 2024

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,644 1,610 Updated Jul 6, 2025

lucidrains / voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python 666 53 Updated Oct 1, 2024

wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 1,047 156 Updated Oct 13, 2025

microsoft / Pengi

An Audio Language model for Audio Tasks

Python 316 15 Updated Apr 19, 2024

stepfun-ai / Step-Audio

Python 4,533 362 Updated Jun 12, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,300 306 Updated Jun 21, 2025

YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Python 458 40 Updated Apr 24, 2024

kaistmm / voxceleb-disentangler

[INTERSPEECH 2024] Official pytorch code for the paper "Disentangled Representation Learning for Environment-agnostic Speaker Recognition"

Python 16 3 Updated Jul 23, 2024

kaistmm / seed-pytorch

[INTERSPEECH 2025] Official code for "SEED: Speaker Embedding Enhancement Diffusion Model"

Python 50 2 Updated Sep 22, 2025

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,904 147 Updated Apr 21, 2025

Honee-W / U-SAM

Official repository for U-SAM (Interspeech 2025)

Python 21 3 Updated Jun 3, 2025

Yuchen413 / AnomalyRuler

Implementation for paper "Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Model"

Python 94 14 Updated Dec 16, 2024

deepseek-ai / DeepSeek-R1

91,306 11,757 Updated Jun 27, 2025

xzf-thu / Mini-Omni-Reasoner

Mini-Omni-Reasoner: a real-time speech reasoning framework that interleaves silent reasoning tokens with spoken response tokens (“thinking-in-speaking”), exploiting the LLM–audio throughput gap to …

159 19 Updated Aug 26, 2025

imxtx / awesome-controllable-speech-synthesis

This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".

175 10 Updated Oct 20, 2025

Xuanjun (Victor) Chen xjchenGit

Lists (1)

🚀 My stack

Starred repositories

Awesome Lists