arxiv:2503.07274

Efficient Distillation of Classifier-Free Guidance using Adapters

Published on Mar 10

· Submitted by

msadat97 on Mar 11

Upvote

Authors:

Cristian Perez Jensen ,

Seyedmorteza Sadat

Abstract

While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters (sim2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models (sim2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.

View arXiv page View PDF Add to collection

Community

msadat97

Paper author Paper submitter about 23 hours ago

•

edited about 23 hours ago

TL;DR: we propose an efficient distillation method, called AGD, to double the sampling speed of classifier-free guidance (CFG) in diffusion models. AGD can be trained on consumer GPUs, and it can be easily integrated into existing diffusion pipelines. The implementation of AGD will be publicly released.

Edit: please check the following thread on X for a more in-depth discussion of the method.
https://x.com/Msadat97/status/1899421342652399780

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.07274 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.07274 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.07274 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.