707 355 881

Daniel van Strien PRO

davanstrien

https://danielvanstrien.xyz/

AI & ML interests

Machine Learning Librarian

Recent Activity

updated a dataset about 1 hour ago

data-is-better-together/fineweb-c-progress

reacted to fdaudens's post with ❤️ about 2 hours ago

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after: - Original release: 8 models, 540K downloads. Just the beginning... - The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals. The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient. When you empower builders, innovation explodes. For everyone. 🚀 The most popular community model? @bartowski's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.

liked a model about 3 hours ago

deepseek-ai/Janus-Pro-7B

View all activity

Articles

Introducing Synthetic Data Workshop: Your Gateway to Easy Synthetic Dataset Creation

Jun 20, 2024

• 12

Data Is Better Together: A Look Back and Forward

Jun 20, 2024

• 19

Synthetic dataset generation techniques: generating custom sentence similarity data

May 23, 2024

• 16

Synthetic dataset generation techniques: Self-Instruct

May 15, 2024

• 14

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

May 7, 2024

• 7

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20, 2024

• 74

Extracting Insights from Model Cards Using Open Large Language Models

Nov 27, 2023

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 29

Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub

Aug 2, 2023

• 1

The Hugging Face Hub for Galleries, Libraries, Archives and Museums

Jun 12, 2023

• 1

Introducing BERTopic Integration with Hugging Face Hub

May 31, 2023

• 7

Organizations

davanstrien's activity

updated a dataset about 1 hour ago

data-is-better-together/fineweb-c-progress

Viewer • Updated about 1 hour ago • 779 • 375 • 3

reacted to fdaudens's post with ❤️ about 2 hours ago

Post

417

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. 🚀

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.