Papers
arxiv:2308.13494

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

Published on Aug 25, 2023
· Submitted by akhaliq on Aug 28, 2023
Authors:
,
,

Abstract

<PRE_TAG>Vision Transformers</POST_TAG> achieve impressive accuracy across a range of visual recognition tasks. Unfortunately, their accuracy frequently comes with high computational costs. This is a particular issue in video recognition, where models are often applied repeatedly across frames or temporal chunks. In this work, we exploit <PRE_TAG>temporal redundancy</POST_TAG> between subsequent inputs to reduce the cost of Transformers for video processing. We describe a method for identifying and re-processing only those <PRE_TAG>tokens</POST_TAG> that have changed significantly over time. Our proposed family of models, <PRE_TAG>Eventful Transformers</POST_TAG>, can be converted from existing Transformers (often without any re-training) and give adaptive control over the compute cost at runtime. We evaluate our method on large-scale datasets for video object detection (ImageNet VID) and action recognition (EPIC-Kitchens 100). Our approach leads to significant computational savings (on the order of 2-4x) with only minor reductions in accuracy.

Community

What kind of resources(GPUs etc.) are needed for minimal training for the purposes of learning ? Can I see some instructions ?

deleted

Code here: https://github.com/WISION-Lab/eventful-transformer/

For the most part, our method doesn't require any re-training. You can generally just use pre-trained weights (links on GitHub).

For fine-tuning the temporal component in Section 5.2, that took <2 days on one 3090 (if I remember correctly).

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2308.13494 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2308.13494 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2308.13494 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.