476 12 91

Loubna Ben Allal

loubnabnl

https://loubnabnl.github.io/

AI & ML interests

SmolLMs, ML for code, data

Recent Activity

new activity 2 days ago

HuggingFaceTB/finemath:Why did you use CC rather than FineWeb to create FineMath?

updated a dataset 4 days ago

loubnabnl/mmlu-evals-smollm-360m

updated a dataset 4 days ago

loubnabnl/code_data

View all activity

Articles

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 292

CodeGemma - an official Google release for code LLMs

Apr 9

• 99

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20

• 69

Organizations

loubnabnl's activity

New activity in HuggingFaceTB/finemath 2 days ago

Why did you use CC rather than FineWeb to create FineMath?

#3 opened 2 days ago by

CryptAL

updated 3 datasets 4 days ago

loubnabnl/mmlu-evals-smollm-360m

Viewer • Updated 4 days ago • 1 • 13

loubnabnl/code_data

Viewer • Updated 4 days ago • 1k • 41

loubnabnl/english-web-100k

Viewer • Updated 4 days ago • 100k • 27

reacted to ginipick's post with 🔥 4 days ago

Post

4120

🌟 Digital Odyssey: AI Image & Video Generation Platform 🎨
Welcome to our all-in-one AI platform for image and video generation! 🚀
✨ Key Features

🎨 High-quality image generation from text
🎥 Video creation from still images
🌐 Multi-language support with automatic translation
🛠️ Advanced customization options

💫 Unique Advantages

⚡ Fast and accurate results using FLUX.1-dev and Hyper-SD models
🔒 Robust content safety filtering system
🎯 Intuitive user interface
🛠️ Extended toolkit including image upscaling and logo generation

🎮 How to Use

Enter your image or video description
Adjust settings as needed
Click generate
Save and share your results automatically

🔧 Tech Stack

FluxPipeline
Gradio
PyTorch
OpenCV

link: ginigen/Dokdo

Turn your imagination into reality with AI! ✨
#AI #ImageGeneration #VideoGeneration #MachineLearning #CreativeTech

7 replies

updated a Space 5 days ago

Running

👁

README

reacted to anton-l's post with 🚀🔥 6 days ago

Post

1965

Introducing 📐𝐅𝐢𝐧𝐞𝐌𝐚𝐭𝐡: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
🛠️ carefully extracting math data from Common Crawl;
🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! 🚀
We’re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2