Elie Bakouch's picture

Elie Bakouch

eliebak

·

AI & ML interests

Training LLM's @ 🤗

Recent Activity

updated a Space about 3 hours ago

open-r1/README

upvoted an article about 4 hours ago

Open-R1: a fully open reproduction of DeepSeek-R1

published a Space about 5 hours ago

open-r1/README

View all activity

Articles

Open-R1: a fully open reproduction of DeepSeek-R1

about 3 hours ago

Diving into MiniMax01 405B MoE

SmolVLM - small yet mighty Vision Language Model

SmolLM - blazingly fast and remarkably powerful

Organizations

eliebak's activity

upvoted an article about 4 hours ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

about 3 hours ago

• 21

upvoted an article 7 days ago

Article

Yay! Organizations can now publish blog Articles

By

•

7 days ago

• 30

upvoted an article 12 days ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

By

•

13 days ago

• 40

upvoted a collection 14 days ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated Dec 22, 2024 • 207

upvoted a paper 18 days ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 20 days ago • 249

upvoted a collection 20 days ago

DolphinLabeled Datasets

Eric Hartford has added labels to help you filter datasets, for your pleasure. • 5 items • Updated 21 days ago • 9

upvoted a paper 24 days ago

2 OLMo 2 Furious

Paper • 2501.00656 • Published 27 days ago • 15

upvoted a paper 25 days ago

YuLan-Mini: An Open Data-efficient Language Model

Paper • 2412.17743 • Published Dec 23, 2024 • 64

upvoted an article about 1 month ago

Article

🌁#81: Key AI Concepts to Follow in 2025

By

•

Dec 23, 2024

• 24

upvoted a paper about 1 month ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 343

upvoted 2 papers 2 months ago

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 49

Balancing Pipeline Parallelism with Vocabulary Parallelism

Paper • 2411.05288 • Published Nov 8, 2024 • 19

upvoted a collection 3 months ago

LoLCATS

Linearizing LLMs with high quality and efficiency. We linearize the full Llama 3.1 model family -- 8b, 70b, 405b -- for the first time! • 4 items • Updated Oct 14, 2024 • 15

upvoted 3 papers 4 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 169

EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 26

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published Sep 18, 2024 • 37

upvoted an article 4 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 216

upvoted 2 papers 5 months ago

Enhancing Training Efficiency Using Packing with Flash Attention

Paper • 2407.09105 • Published Jul 12, 2024 • 15

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23, 2024 • 23

upvoted a collection 5 months ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17, 2024 • 15