Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs Paper • 2411.08719 • Published Nov 10, 2024
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs Paper • 2412.14471 • Published Dec 19, 2024
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper • 2502.09927 • Published 24 days ago
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Paper • 2503.04412 • Published 3 days ago • 1
CodeArena: A Collective Evaluation Platform for LLM Code Generation Paper • 2503.01295 • Published 7 days ago • 7
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 53
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper • 2412.15035 • Published Dec 19, 2024 • 4
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 11 days ago • 19
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 11 days ago • 19
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published 11 days ago • 6
Kanana: Compute-efficient Bilingual Language Models Paper • 2502.18934 • Published 12 days ago • 60
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping Paper • 2501.06589 • Published Jan 11
MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published 18 days ago • 31
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9
One Thousand and One Pairs: A "novel" challenge for long-context language models Paper • 2406.16264 • Published Jun 24, 2024
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13 • 50
view post Post 1550 Mini-QwQ an edge device friendly reasoning model distilled from QwQ-32B 🤗: kz919/QwQ-0.5B-Distilled-SFT🇬 🇬 🇺 🇫: kz919/QwQ-0.5B-Distilled-SFT-gguf🤖: kz919/Mini-QwQ See translation 👍 7 7 + Reply
What's the Meaning of Superhuman Performance in Today's NLU? Paper • 2305.08414 • Published May 15, 2023 • 1
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS Paper • 2411.19655 • Published Nov 29, 2024 • 20