arxiv:2411.02939

A Post-Training Enhanced Optimization Approach for Small Language Models

Published on Nov 5

Authors:

Abstract

This paper delves into the continuous post-training optimization methods for small language models, and proposes a continuous post-training alignment data construction method for small language models. The core of this method is based on the data guidance of large models, optimizing the diversity and accuracy of alignment data. In addition, to verify the effectiveness of the methods in this paper, we used Qwen2-0.5B-Instruct model as the baseline model for small language models, using the alignment dataset constructed by our proposed method, we trained and compared several groups of experiments, including SFT (Supervised Fine Tuning) post-training experiment and KTO (Kahneman Tversky optimization) post-training experiment, as well as SFT-KTO two-stage post-training experiment and model weight fusion experiment. Finally, we evaluated and analyzed the performance of post-training models, and confirmed that the continuous post-training optimization method proposed by us can significantly improve the performance of small language models.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.02939 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.02939 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.02939 in a Space README.md to link it from this page.