Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding Paper • 2501.00712 • Published 11 days ago • 5
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing Paper • 2501.00658 • Published 12 days ago • 7
Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights? Paper • 2302.12480 • Published Feb 24, 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Paper • 2306.14048 • Published Jun 24, 2023 • 12
Robust Mixture-of-Expert Training for Convolutional Neural Networks Paper • 2308.10110 • Published Aug 19, 2023 • 2
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Paper • 2410.19123 • Published Oct 24, 2024 • 15
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Paper • 2410.19123 • Published Oct 24, 2024 • 15
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Paper • 2410.19123 • Published Oct 24, 2024 • 15 • 2
Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19, 2024 • 39