Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:1909.08053

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 2
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Paper • 2405.17428 • Published May 27 • 17

Distributed Training

Papers and resources related to distributed training.

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Paper • 2304.11277 • Published Apr 21, 2023 • 1
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 2
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Paper • 1811.06965 • Published Nov 16, 2018

Distributed Training Papers

Papers related to distributed training

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Paper • 2304.11277 • Published Apr 21, 2023 • 1
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 2
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Paper • 1811.06965 • Published Nov 16, 2018

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 13
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30 • 16
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31 • 22

Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 2
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 8
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 8

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs