Stas Bekman

stas

AI & ML interests

Toolmaker. Software creator, optimizer and harmonizer. Makes things work and fly at Contextual.AI Training LLM/RAG/Generative AI/Machine Learning/Scalability

Recent Activity

Articles

Organizations

BigScience Workshop's profile picture Social Post Explorers's profile picture

Posts 7

view post
Post
1153
If you remember my work on MAMF - to find the realistic TFLOPS achievable ceiling - the Intel AI team has shared their measurements and they scored ...

an incredible 99.4% TFLOPS efficiency for Gaudi 2!

That's quite amazing! Your ROI on these accelerators will be very high.

The full table is here: https://github.com/stas00/ml-engineering/tree/master/compute/accelerator#maximum-achievable-matmul-flops-comparison-table

As we have seen the competitors get their achievable efficiency worse with each new generation, I'm looking forward to see if Gaudi 3 will keep the high bar!

Thanks to Avi Rubin, Lakshman Chari, Imtiaz Sajwani, Ramy J and Zhiqi Tao for helping to get these numbers to the community.
view post
Post
1114
The Universal Checkpointing paper is out! https://arxiv.org/abs/2406.18820

If you remember the Bigscience BLOOM-176B training, Tunji Ruwase and I co-invented this technology for Megatron-Deepspeed in order to enable to quickly scale up and down node topology while continuing training.

Since then the DeepSpeed team continued improving on that and it has now been fully integrated into Deepspeed.

The blog post is here: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-ucp/README.md