Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 8 days ago • 21
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers Paper • 2401.02072 • Published Jan 4, 2024 • 11