GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper โข 2403.03507 โข Published Mar 6, 2024 โข 183
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling Paper โข 2304.01373 โข Published Apr 3, 2023 โข 9