Descript Audio Codec

👉 With Descript Audio Codec, you can compress 44.1 KHz audio into discrete codes at a low 8 kbps bitrate.
🤌 That's approximately 90x compression while maintaining exceptional fidelity and minimizing artifacts.
💪 Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.
👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.)

Model Details

Model Description

License: MIT

Model Sources

Repository: Github Repo
Paper: arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
Demo: Demo Site

Uses

The model is intended for compressing audio files containing speech, music and environmental sounds.

Out-of-Scope Use

It is not intended to be used for compressing other file formats such as text, images, etc.

Bias, Risks, and Limitations

Our model has difficulty reconstructing some challenging audio. It performs best for speech and has more issues with environmental sounds. It does not model some musical instruments perfectly, such as glockenspeil, or synthesizer sounds.

How to Get Started with the Model

This model is meant to be used with our official repo linked above. We release the model here for redundancy purposes. Our code is able to pull the weights from their original location on Github. Please refer to the official README for usage instructions.

Citation

BibTeX:

@misc{kumar2023highfidelity,
      title={High-Fidelity Audio Compression with Improved RVQGAN}, 
      author={Rithesh Kumar and Prem Seetharaman and Alejandro Luebs and Ishaan Kumar and Kundan Kumar},
      year={2023},
      eprint={2306.06546},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}