TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30 • 24 • 5
nGPT: Normalized Transformer with Representation Learning on the Hypersphere Paper • 2410.01131 • Published Oct 1 • 9 • 1
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling Paper • 2410.07145 • Published Oct 9 • 2 • 3
Round and Round We Go! What makes Rotary Positional Encodings useful? Paper • 2410.06205 • Published Oct 8 • 1 • 1
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1 • 29 • 6
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8 • 107 • 7
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1 • 29 • 6