Datasets: NeurIPS LLM Challenge 2023 Datasets that were under consideration for usage in my submission to the 2023 NeurIPS Large Language Model Efficiency Challenge. mosaicml/instruct-v3 Viewer • Updated Oct 2, 2023 • 63k • 434 • 32 databricks/databricks-dolly-15k Viewer • Updated Jun 30, 2023 • 15k • 11.1k • 755 hendrycks/competition_math Updated Jun 8, 2023 • 25k • 125 kaist-ai/CoT-Collection Viewer • Updated Oct 14, 2023 • 1.84M • 1.08k • 117
Papers Detecting Pretraining Data from Large Language Models Paper • 2310.16789 • Published Oct 25, 2023 • 10 Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 18 AutoMix: Automatically Mixing Language Models Paper • 2310.12963 • Published Oct 19, 2023 • 14 An Emulator for Fine-Tuning Large Language Models using Small Language Models Paper • 2310.12962 • Published Oct 19, 2023 • 14
Detecting Pretraining Data from Large Language Models Paper • 2310.16789 • Published Oct 25, 2023 • 10
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 18
An Emulator for Fine-Tuning Large Language Models using Small Language Models Paper • 2310.12962 • Published Oct 19, 2023 • 14