Scalify: scale propagation for efficient low-precision LLM training Paper • 2407.17353 • Published Jul 24 • 12
Training and inference of large language models using 8-bit floating point Paper • 2309.17224 • Published Sep 29, 2023 • 1
SparQ Attention: Bandwidth-Efficient LLM Inference Paper • 2312.04985 • Published Dec 8, 2023 • 38
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures Paper • 2106.05822 • Published Jun 10, 2021