LLM training - a pszemraj Collection

pszemraj 's Collections

BookSum-based Summarizers

Grammar Synthesis

OCR Quality Classifiers

tFINE

LLM training

updated Oct 27, 2024

small-scale pretraining experiments of mine

BEE-spoke-data/smol_llama-101M-GQA

Text Generation • Updated Dec 25, 2023 • 4.91k • • 28
BEE-spoke-data/smol_llama-220M-GQA

Text Generation • Updated Jun 28, 2024 • 4.88k • 12
BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu

Text Generation • Updated Jul 18, 2024 • 172 • 1

Note smol_llama-220M-GQA CPT on fineweb-edu for 10 billion tokens
BEE-spoke-data/smol_llama-81M-tied

Text Generation • Updated Nov 20, 2023 • 2.33k • 6
BEE-spoke-data/mega-ar-126m-4k

Text Generation • Updated Jan 28, 2024 • 4.97k • 4
BEE-spoke-data/verysmol_llama-v11-KIx2

Text Generation • Updated Jan 10, 2024 • 2.16k • 4
pszemraj/pythia-31m-KI_v1-2048-scratch

Text Generation • Updated Nov 18, 2023 • 2.05k
BEE-spoke-data/bert-plus-L8-4096-v1.0

Fill-Mask • Updated Feb 14, 2024 • 26
BEE-spoke-data/mega-encoder-small-16k-v1

Fill-Mask • Updated Mar 17, 2024 • 21 • 4
BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI

Text Generation • Updated Mar 4, 2024 • 30 • 2

Note this is a mid-training checkpoint of what is now smol_llama-220M
pszemraj/jamba-900M-v0.13-KIx2

Text Generation • Updated May 18, 2024 • 34 • 4