To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 37
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Paper • 2402.12875 • Published Feb 20, 2024 • 13
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1, 2024 • 31
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Paper • 2410.12405 • Published Oct 16, 2024 • 13
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper • 2410.23743 • Published Oct 31, 2024 • 62
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments Paper • 2410.23918 • Published Oct 31, 2024 • 20
ATM: Improving Model Merging by Alternating Tuning and Merging Paper • 2411.03055 • Published Nov 5, 2024 • 1
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 115
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Paper • 2411.04496 • Published Nov 7, 2024 • 23
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 34
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens Paper • 2411.17691 • Published Nov 26, 2024 • 13
MALT: Improving Reasoning with Multi-Agent LLM Training Paper • 2412.01928 • Published Dec 2, 2024 • 44
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Paper • 2412.11834 • Published Dec 16, 2024 • 7
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published Dec 19, 2024 • 16
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published Dec 19, 2024 • 86
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models Paper • 2502.03032 • Published Feb 5 • 58
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published about 1 month ago • 122
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 25 days ago • 143
CoT-Valve: Length-Compressible Chain-of-Thought Tuning Paper • 2502.09601 • Published 24 days ago • 14
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published 25 days ago • 16
Dyve: Thinking Fast and Slow for Dynamic Process Verification Paper • 2502.11157 • Published 22 days ago • 6
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity Paper • 2502.11901 • Published 21 days ago • 6
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published 24 days ago • 11
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Paper • 2502.16894 • Published 14 days ago • 26
MeshPad: Interactive Sketch Conditioned Artistic-designed Mesh Generation and Editing Paper • 2503.01425 • Published 7 days ago • 8