Cached Transformers: Improving Transformers with Differentiable Memory Cache Paper • 2312.12742 • Published Dec 20, 2023 • 12
Weight subcloning: direct initialization of transformers using larger pretrained ones Paper • 2312.09299 • Published Dec 14, 2023 • 17
Self-Evaluation Improves Selective Generation in Large Language Models Paper • 2312.09300 • Published Dec 14, 2023 • 14
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) Paper • 2309.08968 • Published Sep 16, 2023 • 22
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 242
Measuring Faithfulness in Chain-of-Thought Reasoning Paper • 2307.13702 • Published Jul 17, 2023 • 27
Optimized Network Architectures for Large Language Model Training with Billions of Parameters Paper • 2307.12169 • Published Jul 22, 2023 • 9
Challenges and Applications of Large Language Models Paper • 2307.10169 • Published Jul 19, 2023 • 47
PolyLM: An Open Source Polyglot Large Language Model Paper • 2307.06018 • Published Jul 12, 2023 • 25