Descript Audio Codec
π With Descript Audio Codec, you can compress 44.1 KHz audio into discrete codes at a low 8 kbps bitrate.
π€ That's approximately 90x compression while maintaining exceptional fidelity and minimizing artifacts.
πͺ Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.
π It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.)
Model Details
Model Description
- License: MIT
Model Sources
- Repository: Github Repo
- Paper: arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
- Demo: Demo Site
Uses
The model is intended for compressing audio files containing speech, music and environmental sounds.
Out-of-Scope Use
It is not intended to be used for compressing other file formats such as text, images, etc.
Bias, Risks, and Limitations
Our model has difficulty reconstructing some challenging audio. It performs best for speech and has more issues with environmental sounds. It does not model some musical instruments perfectly, such as glockenspeil, or synthesizer sounds.
How to Get Started with the Model
This model is meant to be used with our official repo linked above. We release the model here for redundancy purposes. Our code is able to pull the weights from their original location on Github. Please refer to the official README for usage instructions.
Citation
BibTeX:
@misc{kumar2023highfidelity,
title={High-Fidelity Audio Compression with Improved RVQGAN},
author={Rithesh Kumar and Prem Seetharaman and Alejandro Luebs and Ishaan Kumar and Kundan Kumar},
year={2023},
eprint={2306.06546},
archivePrefix={arXiv},
primaryClass={cs.SD}
}