COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs Paper • 2502.17410 • Published 16 days ago
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20, 2024 • 42
MoDeGPT: Modular Decomposition for Large Language Model Compression Paper • 2408.09632 • Published Aug 19, 2024
GeorgiaTech/0.0005_llama_nodpo_3iters_bs128_531lr_oldtrl_iter_3 Text Generation • Updated May 13, 2024 • 6
GeorgiaTech/0.0005_zephyr_withdpo_5551_4iters_bs256_newtrl_iter_3 Text Generation • Updated May 12, 2024 • 9
GeorgiaTech/0.0005_llama_nodpo_3iters_bs128_531lr_oldtrl_iter_2 Text Generation • Updated May 12, 2024 • 94
GeorgiaTech/0.0005_llama_nodpo_3iters_bs128_531lr_oldtrl_iter_1 Text Generation • Updated May 12, 2024 • 91
ToolQA: A Dataset for LLM Question Answering with External Tools Paper • 2306.13304 • Published Jun 23, 2023
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security Paper • 2312.15838 • Published Dec 26, 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models Paper • 2309.15701 • Published Sep 27, 2023 • 2