Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published 9 days ago • 61
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published 5 days ago • 29
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published 4 days ago • 9
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published Dec 24, 2024 • 72
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published 21 days ago • 50
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Paper • 2501.07730 • Published 16 days ago • 16
PokerBench: Training Large Language Models to become Professional Poker Players Paper • 2501.08328 • Published 15 days ago • 14