-
Academia Sinica
- Taipei, Taiwan
-
14:38
(UTC +08:00) - blueburnband
Highlights
- Pro
Stars
Text2midi is the first end-to-end model for generating MIDI files from textual descriptions. By leveraging pretrained large language models and a powerful autoregressive transformer decoder, text2m…
An open-source AI agent that brings the power of Gemini directly into your terminal.
SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning
A simple library for Fréchet Audio Distance (FAD) calculation
Official Repository for "Training-Free Multi-Step Audio Source Separation"
Official Repository for "Music Source Restoration"
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Code required to reproduce the experiments from our paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Open-source Multi-agent Poster Generation from Papers
c/ua is the Docker Container for Computer-Use AI Agents.
Gemma open-weight LLM library, from Google DeepMind
State-of-the-art pretrained music models for training, evaluation, inference
Source code and complementary material for "Keep what you need : extracting efficient subnetworks from large audio representation models".
SALMONN family: A suite of advanced multi-modal LLMs
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
DSPy: The framework for programming—not prompting—language models
ACE-Step: A Step Towards Music Generation Foundation Model
Use any LLMs (Large Language Models) for Deep Research. Support SSE API and MCP server.
Codes for ISMIR 2022 paper: Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention
Chordify Annotator Subjectivity Dataset - A chord-Label harmony dataset with multiple reference annotations per song