GHOST 2.0: generative high-fidelity one shot transfer of heads Paper • 2502.18417 • Published 12 days ago • 63
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 17 days ago • 160
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published 25 days ago • 34
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models Paper • 2502.03032 • Published Feb 5 • 58
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3 • 112
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 108
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published Dec 12, 2024 • 35
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published Dec 12, 2024 • 35 • 4