Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 75
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 17 days ago • 68
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging By akjindal53244 • Aug 19 • 75
view article Article ∞🧙🏼♂️AnyClassifier - Generating Synthetic Data For Text Classification By kenhktsui • Aug 19 • 8
view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models By yuchenlin • Jul 27 • 27
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 53
view article Article Low Latency CPU Based Educational Value Classifier With Generic Educational Value By kenhktsui • Jun 12 • 8
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 60
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 71
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 41
Tiny Series Collection Tiny datasets that empower the foundation of Small Language Model! • 11 items • Updated Jan 26 • 36