view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled • Oct 14 • 61
IrokoBench Collection a human-translated benchmark dataset for 16 African languages covering three tasks: NLI, MMLU and MGSM • 6 items • Updated May 31 • 18
Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper • 2403.13257 • Published Mar 20 • 20
Pretrained Text-Generation Models Below 250M Parameters Collection Great candidates for fine-tuning targeting Transformers.js, ordered by number of parameters. • 9 items • Updated 22 days ago • 7
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16 • 34
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 62 items • Updated 41 minutes ago • 483
Trained Models 🏋️ Collection They may be small, but they're training like giants! • 8 items • Updated 22 days ago • 16
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation Paper • 2310.08185 • Published Oct 12, 2023 • 6
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 37
ChatGPT-Mini Collection A collection of fine-tuned GPT-2 models each designed to deploy a ChatGPT-like model at home. These models can also be deployed on an old computer. • 8 items • Updated Nov 16, 2023 • 5
smol llama Collection 🚧"raw" pretrained smol_llama checkpoints - WIP 🚧 • 4 items • Updated Apr 29 • 6
Indic language fine-tunes Collection Halted State: Attempting to create acceptable quality fine-tunes of different models • 1 item • Updated Nov 23, 2023 • 1
PIC (Partner-in-Crime) project Collection Empathetic, small, really useful personalised models. • 3 items • Updated Dec 10, 2023 • 2
Cramp(ed) Models Collection Smaller models trained locally on my 2xA6000 Lambda Vector • 3 items • Updated Oct 10, 2023 • 1
Shrink Llama - V1 Collection Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept. • 2 items • Updated Sep 12, 2023 • 2