Papers
arxiv:2412.16686

NILE: Internal Consistency Alignment in Large Language Models

Published on Dec 21
· Submitted by DonJoey on Dec 24
Authors:
,
,
,
,
,
,
,
,
,

Abstract

As a crucial step to enhance LLMs alignment with human intentions, Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However, existing IFT datasets often contain knowledge that is inconsistent with LLMs' internal knowledge learned from the pre-training phase, which can greatly affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock LLMs' capability further. NILE operates by eliciting target pre-trained LLM's internal knowledge corresponding to instruction data. The internal knowledge is leveraged to revise the answer in IFT datasets. Additionally, we propose a novel Internal Consistency Filtering (ICF) method to filter training samples, ensuring its high consistency with LLM's internal knowledge. Our experiments demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across multiple LLM ability evaluation datasets, achieving up to 66.6% gain on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE}framework contributes to these substantial performance improvements, and provides compelling evidence that dataset consistency with pre-trained internal knowledge is pivotal for maximizing LLM potential.

Community

Paper submitter

Instruction fine-tuning has been proven to be a crucial method for enhancing the capabilities of LLMs. But how does Instruction fine-tuning differ from traditional fine-tuning in deep learning? And can this distinction help make instruction fine-tuning more effective? Some studies suggest that fine-tuning LLMs should not focus on acquiring new knowledge for pretrained LLMs but rather on understanding tasks. It emphasizes the importance of maintaining consistency with the internal knowledge of LLMs during fine-tuning. This approach has emerged as a promising strategy for optimizing instruction fine-tuning (IFT) datasets to further unlock the potential of LLMs. Inspired by these findings, we propose a novel framework called NILE (INTERNAL CONSISTENCY ALIGNMENT), which generates and selects better IFT datasets by considering the consistency between the internal parameter knowledge of LLMs and the world knowledge in IFT datasets. NILE works by eliciting the target pre-trained LLM's internal knowledge corresponding to instruction data. This internal knowledge is then used to revise the answers in the IFT datasets. Our experiments demonstrate that NILE-aligned IFT datasets significantly enhance LLM performance across multiple LLM evaluation benchmarks, achieving up to a 66.6% improvement on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE framework contributes to these remarkable performance gains, providing compelling evidence that ensuring dataset consistency with the internal knowledge of pre-trained LLMs is pivotal for maximizing their potential.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.16686 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.16686 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.16686 in a Space README.md to link it from this page.

Collections including this paper 4