--- license: mit datasets: - seungheondoh/LP-MusicCaps-MSD - seungheondoh/LP-MusicCaps-MC language: - en metrics: - bleu - bertscore tags: - music - music-captioning --- - **Repository:** [LP-MusicCaps repository](https://github.com/seungheondoh/lp-music-caps) - **Paper:** [ArXiv](https://arxiv.org/abs/2307.16372) # :sound: LP-MusicCaps: LLM-Based Pseudo Music Captioning [![Demo Video](https://i.imgur.com/cgi8NsD.jpg)](https://youtu.be/ezwYVaiC-AM) This is a implementation of [LP-MusicCaps: LLM-Based Pseudo Music Captioning](#). This project aims to generate captions for music. 1) Tag-to-Caption: Using existing tags, We leverage the power of OpenAI's GPT-3.5 Turbo API to generate high-quality and contextually relevant captions based on music tag. 2) Audio-to-Caption: Using music-audio and pseudo caption pairs, we train a cross-model encoder-decoder model for end-to-end music captioning > [**LP-MusicCaps: LLM-Based Pseudo Music Captioning**](#) > SeungHeon Doh, Keunwoo Choi, Jongpil Lee, Juhan Nam > To appear ISMIR 2023 ## TL;DR

- **[1.Tag-to-Caption: LLM Captioning](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/llm_captioning)**: Generate caption from given tag input. - **[2.Pretrain Music Captioning Model](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/music_captioning)**: Generate pseudo caption from given audio. - **[3.Transfer Music Captioning Model](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/music_captioning/transfer.py)**: Generate human level caption from given audio. ## Open Source Material - [pre-trained models](https://huggingface.co./seungheondoh/lp-music-caps) - [music-pseudo caption dataset](https://huggingface.co./datasets/seungheondoh/LP-MusicCaps-MSD) - [demo](https://huggingface.co./spaces/seungheondoh/LP-Music-Caps-demo) are available online for future research. example of dataset in [notebook](https://github.com/seungheondoh/lp-music-caps/blob/main/notebook/Dataset.ipynb)