Stas Bekman's picture

31 3

Stas Bekman

stas

·

https://stasosphere.com/machine-learning/

AI & ML interests

Toolmaker. Software creator, optimizer and harmonizer. Makes things work and fly at Contextual.AI Training LLM/RAG/Generative AI/Machine Learning/Scalability

Recent Activity

posted an update 16 days ago

Do you want ArcticTraining at @SnowflakeDB to add an ability to post-train DeepSeek V3/R1 models with DPO using just a few GPU nodes? Please vote here and tell others about it: https://github.com/snowflakedb/ArcticTraining/discussions/58 ArcticTraining is an open-source, easy to use post-training framework for NVIDIA GPUs built on top of DeepSpeed.

updated a model about 2 months ago

stas/ml-engineering-book

updated a model about 2 months ago

stas/ml-engineering-book

View all activity

Organizations

Posts 8

Post

2069

Do you want ArcticTraining at @SnowflakeDB to add an ability to post-train DeepSeek V3/R1 models with DPO using just a few GPU nodes?

Please vote here and tell others about it: https://github.com/snowflakedb/ArcticTraining/discussions/58

ArcticTraining is an open-source, easy to use post-training framework for NVIDIA GPUs built on top of DeepSpeed.

Articles 6

Article

47

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

View all Articles

Papers 6

arxiv:2406.18820

arxiv:2401.14489

arxiv:2306.16527

arxiv:2211.05100

models 9

stas/ml-engineering-book

Updated Jan 22 • 16

stas/tiny-random-llama-2

Text Generation • Updated Nov 14, 2023 • 1.21k • 39

stas/tiny-m2m_100

Text2Text Generation • Updated Apr 29, 2022 • 2.52k

stas/tr8b-104B-debug3

Updated Nov 29, 2021

stas/pegasus-cnn_dailymail-tiny-random

Text2Text Generation • Updated Jul 1, 2021 • 279

stas/mt5-tiny-random

Text2Text Generation • Updated Jun 23, 2021 • 41.1k • 2

stas/tiny-wmt19-en-de

Text2Text Generation • Updated May 3, 2021 • 281

stas/tiny-wmt19-en-ru

Text2Text Generation • Updated May 3, 2021 • 3.92k

stas/t5-very-small-random

Text2Text Generation • Updated Apr 21, 2021 • 107

datasets 8

stas/openwebtext-synthetic-testing

Updated Nov 14, 2023 • 68 • 4

stas/oscar-en-10k

Viewer • Updated Oct 19, 2022 • 10k • 258 • 2

stas/c4-en-10k

Viewer • Updated Oct 19, 2022 • 10k • 518 • 4

stas/general-pmd-synthetic-testing

Updated Oct 18, 2022 • 51

stas/cm4-synthetic-testing

Updated Oct 18, 2022 • 59

stas/openwebtext-10k

Viewer • Updated Sep 15, 2021 • 10k • 7.12k • 27

stas/wmt14-en-de-pre-processed

Viewer • Updated Feb 16, 2021 • 4.55M • 201 • 3

stas/wmt16-en-ro-pre-processed

Viewer • Updated Feb 16, 2021 • 614k • 195