arxiv:2410.08184

Scaling Laws For Diffusion Transformers

Published on Oct 10, 2024

Authors:

Abstract

Diffusion transformers (DiT) have already achieved appealing synthesis and scaling properties in content recreation, e.g., image and video generation. However, scaling laws of DiT are less explored, which usually offer precise predictions regarding optimal model size and data requirements given a specific compute budget. Therefore, experiments across a broad range of compute budgets, from 1e17 to 6e18 FLOPs are conducted to confirm the existence of scaling laws in DiT for the first time. Concretely, the loss of pretraining DiT also follows a power-law relationship with the involved compute. Based on the scaling law, we can not only determine the optimal model size and required data but also accurately predict the text-to-<PRE_TAG>image generation</POST_TAG> loss given a model with 1B parameters and a compute budget of 1e21 FLOPs. Additionally, we also demonstrate that the trend of pre-training <PRE_TAG>loss</POST_TAG> matches the generation performances (e.g., FID), even across various datasets, which complements the mapping from compute to synthesis quality and thus provides a predictable benchmark that assesses model performance and data quality at a reduced cost.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2410.08184 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2410.08184 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2410.08184 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.