PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation Paper • 2403.09192 • Published Mar 14, 2024
Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal Paper • 2404.17808 • Published Apr 27, 2024
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts Paper • 2407.09816 • Published Jul 13, 2024 • 1
LBPE: Long-token-first Tokenization to Improve Large Language Models Paper • 2411.05504 • Published Nov 8, 2024 • 1
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Paper • 2410.16077 • Published Oct 21, 2024 • 1
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models Paper • 2412.07171 • Published Dec 10, 2024 • 1
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published 27 days ago • 52
PY007/slimpajama_llama_tokenized_upsample_4096_chunk_1M Viewer • Updated Apr 19, 2024 • 5.04k • 90 • 2
PY007/slimpajama_llama_tokenized_upsample_4096_chunk_256K Viewer • Updated Apr 19, 2024 • 3.94k • 62 • 1