POP2PIANO
Pop2Piano, a Transformer network that generates piano covers given waveforms of pop music.
Model Details
Pop2Piano was proposed in the paper Pop2Piano : Pop Audio-based Piano Cover Generation by Jongho Choi and Kyogu Lee.
Piano covers of pop music are widely enjoyed, but generating them from music is not a trivial task. It requires great expertise with playing piano as well as knowing different characteristics and melodies of a song. With Pop2Piano you can directly generate a cover from a song's audio waveform. It is the first model to directly generate a piano cover from pop audio without melody and chord extraction modules.
Pop2Piano is an encoder-decoder Transformer model based on T5. The input audio is transformed to its waveform and passed to the encoder, which transforms it to a latent representation. The decoder uses these latent representations to generate token ids in an autoregressive way. Each token id corresponds to one of four different token types: time, velocity, note and 'special'. The token ids are then decoded to their equivalent MIDI file.
Model Sources
Usage
To use Pop2Piano, you will need to install the π€ Transformers library, as well as the following third party modules:
pip install git+https://github.com/huggingface/transformers.git
pip install pretty-midi==0.2.9 essentia==2.1b6.dev1034 librosa scipy
Please note that you may need to restart your runtime after installation.
Pop music to Piano
Code Example
- Using your own Audio
>>> import librosa
>>> from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor
>>> audio, sr = librosa.load("<your_audio_file_here>", sr=44100) # feel free to change the sr to a suitable value.
>>> model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
>>> processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")
>>> inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt")
>>> model_output = model.generate(input_features=inputs["input_features"], composer="composer1")
>>> tokenizer_output = processor.batch_decode(
... token_ids=model_output, feature_extractor_output=inputs
... )["pretty_midi_objects"][0]
>>> tokenizer_output.write("./Outputs/midi_output.mid")
- Audio from Hugging Face Hub
>>> from datasets import load_dataset
>>> from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor
>>> model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
>>> processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")
>>> ds = load_dataset("sweetcocoa/pop2piano_ci", split="test")
>>> inputs = processor(
... audio=ds["audio"][0]["array"], sampling_rate=ds["audio"][0]["sampling_rate"], return_tensors="pt"
... )
>>> model_output = model.generate(input_features=inputs["input_features"], composer="composer1")
>>> tokenizer_output = processor.batch_decode(
... token_ids=model_output, feature_extractor_output=inputs
... )["pretty_midi_objects"][0]
>>> tokenizer_output.write("./Outputs/midi_output.mid")
Example
Here we present an example of generated MIDI.
- Actual Pop Music
- Generated MIDI
Tips
- Pop2Piano is an Encoder-Decoder based model like T5.
- Pop2Piano can be used to generate midi-audio files for a given audio sequence.
- Choosing different composers in
Pop2PianoForConditionalGeneration.generate()
can lead to variety of different results. - Setting the sampling rate to 44.1 kHz when loading the audio file can give good performance.
- Though Pop2Piano was mainly trained on Korean Pop music, it also does pretty well on other Western Pop or Hip Hop songs.
Citation
BibTeX:
@misc{choi2023pop2piano,
title={Pop2Piano : Pop Audio-based Piano Cover Generation},
author={Jongho Choi and Kyogu Lee},
year={2023},
eprint={2211.00895},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
- Downloads last month
- 6,772