-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 37 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 46 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22
Jason Wolosonovich
wolosonovich
AI & ML interests
None yet
Recent Activity
updated
a collection
3 days ago
Research
upvoted
a
paper
3 days ago
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
upvoted
an
article
11 days ago
Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference
Organizations
Collections
2
models
None public yet
datasets
None public yet