Papers
arxiv:2503.04725

L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Published on Mar 6
· Submitted by zhuoc3 on Mar 7
Authors:
,
,

Abstract

We rigorously establish a bipartite mutual information scaling law in natural language that governs long-range dependencies. This scaling law, which we show is distinct from and scales independently of the conventional two-point mutual information, is the key to understanding long-context language modeling. Using this scaling law, we formulate the Long-context Language Modeling (L^2M) condition, which relates a model's capacity for effective long context length modeling to the scaling of its latent state size for storing past information. Our results are validated through experiments on both transformers and state space models. This work establishes a theoretical foundation that guides the development of large language models toward longer context lengths.

Community

Paper author Paper submitter

This paper establishes a fundamental bipartite mutual information scaling law in natural language that follows power-law growth (L^β). The authors show this scaling is distinct from that of conventional two-point mutual information and is the key to understanding long-context language modeling. Based on this insight, they formulate the Long-context Language Modeling (L²M) condition, which relates a model's ability to handle long contexts to how its history state dimensions must scale. Their empirical validation confirms the theoretical predictions across different architectures, demonstrating how the scaling behavior of history states affects performance on long-range dependencies. These findings provide a theoretical foundation for understanding long-range dependencies in language models and guiding architecture development. Code is available at https://github.com/LSquaredM/mutual_info_scaling_law.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.04725 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.04725 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.04725 in a Space README.md to link it from this page.

Collections including this paper 1