view post Post 1223 Wow, impressive 340B model by nvidia with a nice permissive license! π The technical report is full of insights and seems to use a different learning rate schedule than cosine, probably a variant of WSD. Hope to get more info on that! π nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911
Cool papers Efficient Streaming Language Models with Attention Sinks Paper β’ 2309.17453 β’ Published Sep 29, 2023 β’ 13 Simple and Controllable Music Generation Paper β’ 2306.05284 β’ Published Jun 8, 2023 β’ 147 FinGPT: Large Generative Models for a Small Language Paper β’ 2311.05640 β’ Published Nov 3, 2023 β’ 28 MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper β’ 2305.07185 β’ Published May 12, 2023 β’ 9
Efficient Streaming Language Models with Attention Sinks Paper β’ 2309.17453 β’ Published Sep 29, 2023 β’ 13
FinGPT: Large Generative Models for a Small Language Paper β’ 2311.05640 β’ Published Nov 3, 2023 β’ 28
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper β’ 2305.07185 β’ Published May 12, 2023 β’ 9
LLM.C Fineweb vs Edu-Fineweb eliebak/wsd_124M_150B_edu Text Generation β’ Updated Jun 11, 2024 β’ 119 eliebak/wsd_124M_150B_fw Text Generation β’ Updated Jun 11, 2024 β’ 119 eliebak/wsd_124M_300B_edu Text Generation β’ Updated Jun 11, 2024 β’ 118 eliebak/wsd_124M_300B_fw Text Generation β’ Updated Jun 11, 2024 β’ 119