Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.14438

Papers - Encoders - Audio

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21 • 2
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Paper • 1712.05884 • Published Dec 16, 2017 • 2

Papers - Audio - Whisper vs Clap - Whisper wins with ASR

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21 • 2

Papers - Multimodal - Audio

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21 • 2

Papers - Audio - Training

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21 • 2
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Paper • 2403.17694 • Published Mar 26 • 10
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 29

Papers - Automatic Speech Recognition

Streaming Transformer ASR with Blockwise Synchronous Beam Search

Paper • 2006.14941 • Published Jun 25, 2020 • 2
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21 • 2

Papers - Encoders

Functional Interpolation for Relative Positions Improves Long Context Transformers

Paper • 2310.04418 • Published Oct 6, 2023 • 4
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

Paper • 2106.09997 • Published Jun 18, 2021 • 2
Neural Machine Translation of Rare Words with Subword Units

Paper • 1508.07909 • Published Aug 31, 2015 • 4
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21 • 2

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 19
Structural Similarities Between Language Models and Neural Response Measurements

Paper • 2306.01930 • Published Jun 2, 2023 • 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search

Paper • 2006.14941 • Published Jun 25, 2020 • 2
NU-GAN: High resolution neural upsampling with GAN

Paper • 2010.11362 • Published Oct 22, 2020 • 2

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22 • 19
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 3
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

there's many more on arxiv if you search for CLAP

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Paper • 2211.06687 • Published Nov 12, 2022 • 3
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Paper • 2401.17690 • Published Jan 31 • 5
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 53
Audiobox: Unified Audio Generation with Natural Language Prompts

Paper • 2312.15821 • Published Dec 25, 2023 • 12

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs