Ontocord's M*DEL

community

https://docs.google.com/document/d/1JCzJ1wdBMBVwsFW4CWGUbX-YEDXB0yS4mfFbvwPLQrI/edit?usp=sharing

Activity Feed Request to join this org

AI & ML interests

Mixture of Experts, Branch Merge Train, International Cooperation, Reuse, https://github.com/ontocord/MDEL

Recent Activity

Taishi-N324 authored a paper 2 days ago

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs

Taishi-N324 authored a paper 2 days ago

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

mayank-mishra authored a paper 3 days ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

View all activity

Multi-Domain-Expert-Learning's activity

Taishi-N324

authored 2 papers 2 days ago

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs

Paper • 2411.08719 • Published Nov 10, 2024

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

Paper • 2412.14471 • Published Dec 19, 2024

mayank-mishra

authored a paper 3 days ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Paper • 2502.09927 • Published 24 days ago

Taishi-N324

authored a paper 3 days ago

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Paper • 2503.04412 • Published 3 days ago • 1

terryyz

authored a paper 5 days ago

CodeArena: A Collective Evaluation Platform for LLM Code Generation

Paper • 2503.01295 • Published 7 days ago • 7

huu-ontocord

authored 3 papers 7 days ago

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 53

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Paper • 2412.15035 • Published Dec 19, 2024 • 4

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

Paper • 2502.19413 • Published 11 days ago • 19

JJitsev

authored a paper 9 days ago

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

Paper • 2502.19413 • Published 11 days ago • 19

Taishi-N324

authored a paper 10 days ago

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Paper • 2502.19261 • Published 11 days ago • 6

bzantium

authored a paper 11 days ago

Kanana: Compute-efficient Bilingual Language Models

Paper • 2502.18934 • Published 12 days ago • 60

mayank-mishra

authored a paper 14 days ago

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Paper • 2501.06589 • Published Jan 11

sted97

authored a paper 14 days ago

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published 18 days ago • 31

huu-ontocord

published a dataset 24 days ago

Multi-Domain-Expert-Learning/hummingbird

Preview • Updated Jun 5, 2023

vumichien

authored a paper 26 days ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 9

marzena

authored a paper about 1 month ago

One Thousand and One Pairs: A "novel" challenge for long-context language models

Paper • 2406.16264 • Published Jun 24, 2024

liangyuch

authored a paper about 2 months ago

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published Jan 13 • 50

kz919

posted an update about 2 months ago

Post

1550

Mini-QwQ an edge device friendly reasoning model distilled from QwQ-32B
🤗: kz919/QwQ-0.5B-Distilled-SFT
🇬 🇬 🇺 🇫: kz919/QwQ-0.5B-Distilled-SFT-gguf
🤖: kz919/Mini-QwQ

sted97

authored 2 papers 3 months ago

What's the Meaning of Superhuman Performance in Today's NLU?

Paper • 2305.08414 • Published May 15, 2023 • 1

Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS

Paper • 2411.19655 • Published Nov 29, 2024 • 20