TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation Paper • 2003.11963 • Published Mar 26, 2020
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model Paper • 2212.04960 • Published Dec 9, 2022 • 1
Continuous Learning in a Hierarchical Multiscale Neural Network Paper • 1805.05758 • Published May 15, 2018 • 1
HuggingFace's Transformers: State-of-the-art Natural Language Processing Paper • 1910.03771 • Published Oct 9, 2019 • 16
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper • 2210.01970 • Published Sep 30, 2022 • 11
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents Paper • 1901.08149 • Published Jan 23, 2019 • 2
Datasets: A Community Library for Natural Language Processing Paper • 2109.02846 • Published Sep 7, 2021 • 10
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12 • 62
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Paper • 2203.05482 • Published Mar 10, 2022 • 6
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark Paper • 2410.19168 • Published Oct 24 • 19
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4 • 28
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published 10 days ago • 33