LLM Technical Report
Paper • 2412.15115 • Published • 341Note QWen2.5-2024.12 https://github.com/QwenLM/Qwen/blob/main/README_CN.md https://qwen.readthedocs.io/zh-cn/latest/ https://qwenlm.github.io/zh/blog/qwen2.5/
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 140Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 42Note DeepSeek-V1-2024.1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 79DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 51DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Paper • 2412.10302 • Published • 11
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 31Note Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token 通过一种创新方法,将长链推理(CoT)模型(特别是 DeepSeek R1系列模型)中的推理能力提取出来,并将其注入到标准的大模型V3 中 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 110
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 51Note Apple DCLM
Mixtral of Experts
Paper • 2401.04088 • Published • 158Note Mistral's MoE Model
Mistral 7B
Paper • 2310.06825 • Published • 47Note Mistral's 7B Model
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 48Note Google DeepMind Gemma Team
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 62Note Google Gemini 1.5
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Paper • 2112.11446 • Published • 1Note DeepMind Gopher Model
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 12Note OpenAI GPT-3
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 13Note Meta LLaMa
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8Note OpenAI-CodeX
Pixtral 12B
Paper • 2410.07073 • Published • 63
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10Note Chinchilla-DeepMind-2022.3 we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled.
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 16Note OpenAI-InstructGPT/ChatGPT-2022.3
GPT-4o System Card
Paper • 2410.21276 • Published • 83
OLMoE: Open Mixture-of-Experts Language Models
Paper • 2409.02060 • Published • 78Note Phi-4 Technical Report-2024.12 https://www.microsoft.com/en-us/research/uploads/prod/2024/12/P4TechReport.pdf
Foundations of Large Language Models
Paper • 2501.09223 • Published • 2Note a 360◦ Open Source model
LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch
Paper • 2501.07124 • Published