Transformers
Safetensors
English
music
music-captioning
Inference Endpoints
File size: 2,177 Bytes
2639b94
ced41e9
 
 
 
 
 
 
 
 
 
 
 
2639b94
 
ced41e9
2639b94
ced41e9
2639b94
ced41e9
2639b94
ced41e9
 
2639b94
ced41e9
2639b94
ced41e9
2639b94
ced41e9
2639b94
ced41e9
 
 
2639b94
 
ced41e9
2639b94
 
ced41e9
 
 
2639b94
ced41e9
 
 
2639b94
ced41e9
2639b94
ced41e9
 
 
2639b94
ced41e9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: mit
datasets:
- seungheondoh/LP-MusicCaps-MSD
- seungheondoh/LP-MusicCaps-MC
language:
- en
metrics:
- bleu
- bertscore
tags:
- music
- music-captioning
---

# LP-MusicCaps-HF

This is the LP-MusicCaps model but loadable by the hf library directly

# Original Model Card

- **Repository:** [LP-MusicCaps repository](https://github.com/seungheondoh/lp-music-caps)
- **Paper:** [ArXiv](https://arxiv.org/abs/2307.16372)

# :sound: LP-MusicCaps: LLM-Based Pseudo Music Captioning

[![Demo Video](https://i.imgur.com/cgi8NsD.jpg)](https://youtu.be/ezwYVaiC-AM)

This is a implementation of [LP-MusicCaps: LLM-Based Pseudo Music Captioning](#). This project aims to generate captions for music. 1) Tag-to-Caption: Using existing tags, We leverage the power of OpenAI's GPT-3.5 Turbo API to generate high-quality and contextually relevant captions based on music tag. 2) Audio-to-Caption: Using music-audio and pseudo caption pairs, we train a cross-model encoder-decoder model for end-to-end music captioning

> [**LP-MusicCaps: LLM-Based Pseudo Music Captioning**](#)   
> SeungHeon Doh, Keunwoo Choi, Jongpil Lee, Juhan Nam   
> To appear ISMIR 2023   


## TL;DR


<p align = "center">
<img src = "https://i.imgur.com/2LC0nT1.png">
</p>

- **[1.Tag-to-Caption: LLM Captioning](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/llm_captioning)**: Generate caption from given tag input.
- **[2.Pretrain Music Captioning Model](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/music_captioning)**: Generate pseudo caption from given audio.
- **[3.Transfer Music Captioning Model](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/music_captioning/transfer.py)**: Generate human level caption from given audio.

## Open Source Material

- [pre-trained models](https://huggingface.co./seungheondoh/lp-music-caps) 
- [music-pseudo caption dataset](https://huggingface.co./datasets/seungheondoh/LP-MusicCaps-MSD)
- [demo](https://huggingface.co./spaces/seungheondoh/LP-Music-Caps-demo) 

are available online for future research. example of dataset in [notebook](https://github.com/seungheondoh/lp-music-caps/blob/main/notebook/Dataset.ipynb)