LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
Paper
•
2310.03294
•
Published
•
2
We are passionate about designing strong, efficient, and secure machine learning models and algorithms, and in building scalable, practical distributed systems that can support real-world machine learning workloads.