Theorical - a tmarechaux Collection

tmarechaux 's Collections

LLMs

IR

Theorical

updated Oct 8

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 83
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 19
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

Paper • 2309.15129 • Published Sep 25, 2023 • 6
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 78
The Consensus Game: Language Model Generation via Equilibrium Search

Paper • 2310.09139 • Published Oct 13, 2023 • 12
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

Paper • 2212.11685 • Published Dec 22, 2022 • 2
Levels of AGI: Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 34
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 603
Scaling Instructable Agents Across Many Simulated Worlds

Paper • 2404.10179 • Published Mar 13 • 27
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19 • 149
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Paper • 2407.01392 • Published Jul 1 • 39
softmax is not enough (for sharp out-of-distribution)

Paper • 2410.01104 • Published Oct 1 • 1
Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 168
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3 • 47