Papers
arxiv:2305.05084

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Published on May 8, 2023
Authors:
,
,
,
,
,
,

Abstract

Conformer-based models have become the most dominant end-to-end architecture for speech processing tasks. In this work, we propose a carefully redesigned Conformer with a new down-sampling schema. The proposed model, named Fast Conformer, is 2.8x faster than original Conformer, while preserving state-of-the-art accuracy on Automatic Speech Recognition benchmarks. Also we replace the original Conformer global attention with limited context attention post-training to enable transcription of an hour-long audio. We further improve long-form speech transcription by adding a global token. Fast Conformer combined with a Transformer decoder also outperforms the original Conformer in accuracy and in speed for Speech Translation and Spoken Language Understanding.

Community

Sign up or log in to comment

Models citing this paper 41

Browse 41 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.05084 in a dataset README.md to link it from this page.

Spaces citing this paper 52

Collections including this paper 1