+
Skip to content

aptrn/AFTER

 
 

Repository files navigation

After Logo

AFTER: Audio Features Transfer and Exploration in Real-time

AFTER is a diffusion-based generative model that creates new audio by blending two sources: one audio stream to set the style or timbre, and another input (either audio or MIDI) to shape the structure over time.

This repository is a real-time implementation of the research paper Combining audio control and style transfer using latent diffusion (read it here) by Nils Demerlé, P. Esling, G. Doras, and D. Genova. Some transfer examples can be found on the project webpage. This real-time version integrates with MaxMSP and Ableton Live through nn_tilde, an external that embeds PyTorch models into MaxMSP.

You can find pretrained models and max patches for realtime inference in the last section of this page.

Installation

git clone https://github.com/acids-ircam/AFTER.git
cd AFTER/
pip install -e .

If you want to use the model in MaxMSP or PureData for real-time generation, please refer to the nn_tilde external documentation and follow the installation steps.

Model Training

Training AFTER involves 3 separate steps, autoencoder training, model training and model export.

Neural audio codec

If you already have a streamable audio codec such as a pretrained RAVE model, you can directly skip to the next section. Also, we provide four audio codecs already trained on different datasets here.

Before training the autoencoder, you need to preprocess your audio files into an lmdb database :

after prepare_dataset --input_path /audio/folder --output_path /dataset/path --save_waveform True --waveform_augmentation none 

Then, you can start the model training

after train_autoencoder --name AE_model_name --db_path /audio/folder  --config baseAE --gpu 0

where db_path refers to the prepared dataset location. The tensorboard logs and checkpoints are saved by default to ./autoencoder_runs/.

After training, the model has to be exported to a torchscript file using

after export_autoencoder  --model_path autoencoder_runs/AE_model_name --step 1000000

This will save two .ts files in the run folder, one for streaming and one for offline inference (export_stream.ts and export.ts respectively).

AFTER training

First, you need to prepare your dataset before training. Since our diffusion model works in the latent space of the autoencoder, we pre-compute the latent embeddings to speed up training :

after prepare_dataset --input_path /audio/folder --output_path /dataset/path --emb_model_path AE_model_run_path/export.ts
  • num_signal flag sets the duration of the audio chunks for training in number of samples (must be a power of 2). (default: 524288 ~ 11 seconds)
  • sample_rate flag sets the resampling rate. (default: 44100)
  • gpu device to use for computing the embeddings. Use -1 for cpu (default: 0)

To train a midi-to-audio AFTER model you need to either use the flag --basic_pitch_midi to transcript the midi from the audio files or define your own file parsing function in ./after/dataset/parsers.py.

If you plan to have more advanced use of the models, please refer to the help function for all the arguments.

Then, a training is started with

after train  --name diff_model_name --db_path /dataset/path --emb_model_path AE_model_run_path/export.ts --config CONFIG_NAME

Different configurations are available in diffusion/configs and can be combined :

Category Config Description
Model base Default audio-to-audio timbre and structure separation model.
midi Uses MIDI as input for the structure encoder
Additional tiny Reduces the model's capacity for faster inference. Useful for testing and low-resource environments.
cycle Experimental: adds a cycle consistency phase during training, which can improve timbre and structure disentanglement.

The tensorboard logs and checkpoints are saved to /diffusion/runs/model_name, and you can experiment with you trained model using the notebooks notebooks/audio_to_audio_demo.ipynb and notebooks/midi_to_audio_demo.ipynb.

Export

Once the training is complete, you can export the model to an nn_tilde torchscript file for inference in MaxMSP and PureData.

For an audio-to-audio model :

after export --model_path diff_model_name --emb_model_path AE_model_run_path/export_stream.ts --step 800000

For a MIDI-to-audio model :

after export_midi --model_path diff_model_name --emb_model_path AE_model_run_path/export_stream.ts --npoly 4 --step 800000

where npoly sets the number for voices for polyphony. Make sure to use the streaming version of the exported autoencoder (denoted by _stream.ts).

Inference in MaxMSP

You can experiment with inference in MaxMSP using the patches in ./patchs and the pretrained models available here.

Artistic Applications

AFTER has been applied in several projects:

  • The Call by Holly Herndon and Mat Dryhurst, an interactive sound installation with singing voice transfer, at Serpentine Gallery in London until February 2, 2025.
  • A live performance by French electronic artist Canblaster for Forum Studio Session at IRCAM. The full concert is available on YouTube.
  • Nature Manifesto, an immersive sound installation by Björk and Robin Meier, at Centre Pompidou in Paris from November 20 to December 9, 2024.

We look forward to seeing new projects and creative uses of AFTER.

About

AFTER : Audio Features Transfer and Exploration in Real-time

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 65.7%
  • Max 33.0%
  • Jupyter Notebook 1.3%
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载