DISLab/SummLlama3-8B · Hugging Face

SummLlama3-8B

Are you looking for a summarizer that can generate more human-preferred summaries across multiple domains?

Our SummLlama3-8B could be exactly what you need!

SummLlama3 is initialized from Llama3-8B-Instruct, with additional training using Direct Preference Optimization (DPO) based on large-scale (over 100K) summarization feedback.

The feedback encompasses a wide range of input documents, from short to lengthy texts, including both dialogue and non-dialogue formats, and spans across seven distinct domains:

Four non-dialouge domains: News, Lifestyle, Report, Medical
Three dialogue domains: Daily Life, Interview, Meeting

Surprisingly, it outperforms the nearly 10x larger Llama3-70B-Instruct and also GPT-4o while offering much faster inference speed.

This is automated evaluation results:

Config.	Faithfulness	Completeness	Conciseness	Average
Llama3-8B-Instruct	0.864	0.583	0.450	0.632
Llama3-70B-Instruct	0.931	0.596	0.487	0.671
GPT-4o	0.940	0.657	0.437	0.678
SummLlama3-8B	0.931	0.614	0.659	0.735
SummLlama3-70B	0.950	0.632	0.754	0.779

This is human evaluation results:

Config.	Faithfulness	Completeness	Conciseness	Average
Llama3-8B-Instruct	0.902	0.636	0.784	0.774
Llama3-70B-Instruct	0.953	0.659	0.792	0.801
SummLlama3-8B	0.980	0.697	0.959	0.879

Please refer to our paper to catch up how to exploit LLM-generated feedback in the context of text summarization.

Here are other versions:

SummLlama3-70B,

https://huggingface.co./DISLab/SummLlama3-70B

SummLlama3.1-Series

https://huggingface.co./DISLab/SummLlama3.1-8B

https://huggingface.co./DISLab/SummLlama3.1-70B

SummLlama3.2-Series

https://huggingface.co./DISLab/SummLlama3.2-3B

Recommended Prompt for Text Summarization:

We recommend to use the prompt below to get the summary, since we trained the model using this.

def format_chat_template(document):
    instruction = "Please summarize the input documnet."
    row_json = [{"role": "user", "content": f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{document}\n\n### Response:\n"}]
    return tokenizer.apply_chat_template(row_json, tokenize=False)

Here is a brief overview of our summarizer:

Rather than relying on expensive human feedback, we utilize high-quality, multi-dimensional, and fine-grained feedback generated by large language models (LLMs).

This model excels at faithfulness, completeness, and conciseness, which are the three human-preferred aspects to judge what is a good summarizer.

Faithfulness: a summarizer does not manipulate the information in the input text and add any information not directly inferable from the input text.
Completeness: a summarizer ensures the inclusion of all key information from the input text in the output summary.
Conciseness: a summarizer refrains from incorporating information outside the key information in the output, maintaining a succinct and focused summary.

Based on our comprehensive evaluation, which included both human and automated assessments of summary quality, SummLlama3 demonstrated significant improvements over the original Llama3 series.

Here is the results:

Human Evaluation

Config.	Faithfulness	Completeness	Conciseness	Average
Llama3-8B-Instruct	0.902	0.636	0.784	0.774
Llama3-70B-Instruct	0.953	0.659	0.792	0.801
SummLlama3-8B	0.980	0.697	0.959	0.879

Autoamted Evaluation using FineSurE

Config.	Faithfulness	Completeness	Conciseness	Average
Llama3-8B-Instruct	0.864	0.583	0.450	0.632
Llama3-70B-Instruct	0.931	0.596	0.487	0.671
SummLlama3-8B	0.931	0.614	0.659	0.735
SummLlama3-70B	0.950	0.632	0.754	0.779

Example

See an example how the summary improved by SummLlama3-8B over Llama3-8/70B-Instruct on the document below:

Speaker	Dialogue
Person 1	Hey, Paul, you're still having Thanksgiving dinner at my house on Thursday, right?
Person 2	Yeah, thanks again for the invitation. I was worried I'd have to spend it alone after my parents announced they'd be in Finland.
Person 1	I think it's strange that you're not going with them. Is it because of the cold weather in Northern Europe?
Person 2	Not exactly. I have a lot of work to do next week.
Person 1	Well, I'm glad you won't be alone on a holiday. It'll be great to have you celebrate with us. The more the merrier!
Person 2	Exactly, how many people will be there?
Person 1	Let's see, my parents, my grandpa, my sister, my brother-in-law, my niece and my two cousins, plus us. So, ten people altogether.
Person 2	Should I bring anything?
Person 1	Nope, we've got it covered.
Person 2	Come on, I hate to show up empty-handed. You are being so generous to have me. I know! I'll bring pie. I'll make a great pumpkin pie.
Person 1	Sounds delicious, but my sister has already volunteered for dessert. Look, don't worry about it. There will be tons of food. Turkey, stuffing, green beans, sweet potatoes.
Person 2	Wow, I'm getting hungry just thinking about it. How about this, I'll bring a nice bottle of wine. You can never have too much of that.
Person 1	Well, isn't that the truth? Sounds good to me.

Model	Summary
Llama3-8B-Instruct	Person1 invites Person2 to join them for Thanksgiving dinner at their house. Person2 accepts the invitation, citing that they were worried they would have to spend the holiday alone after their parents announced they would be in Finland. Person1 reassures Person2 that they are welcome and that the more the merrier. Person2 asks how many people will be attending and is told that there will be 10 people, including their family and cousins. Person2 offers to bring a gift, but Person1 declines, saying that they have everything covered. Person2 persists and offers to bring a bottle of wine or a pie, but Person1 suggests that their sister has already volunteered to bring dessert.
Llama3-70B-Instruct	Person 1 invites Person 2 to their house for Thanksgiving dinner on Thursday. Person 2 is grateful for the invitation and explains they can't go to Finland with their parents due to work commitments. Person 1 is happy to have them celebrate with their family, which will consist of 10 people. When Person 2 offers to bring something, Person 1 declines, but eventually agrees to let them bring a bottle of wine.
SummLlama3-8B	The conversation is about Thanksgiving dinner arrangements, with Person2 (Paul) confirming his attendance at Person1's (host's) house on Thursday, discussing what to bring and what's already been taken care of, and finalizing the offer to bring wine instead of pie since someone else is handling desserts.

The summary of SummLlama3-8B can be considered a much human-preferred summary for the following reasons:

Core Focus: The summary accurately captures the main theme of the conversation, which revolves around the Thanksgiving dinner arrangements. It highlights how the two people confirm plans, discuss what to bring, and finalize the decision for Person2 to bring wine instead of pie. This maintains the core context.

Inclusion of Key-facts: The summary covers the important details of the conversation, including Person2's initial offer to bring dessert (pumpkin pie) and the shift to bringing wine due to another family member handling dessert. Other summaries tend to overlook or simplify this progression, while SummLlama3-8B fully captures the interaction’s key events.

Clarity and Conciseness: The summary is structured in a straightforward, concise manner, effectively summarizing the conversation without unnecessary details. It presents the flow and outcome of the discussion clearly, making it easy for readers to understand. The logical order of events is maintained, ensuring a smooth narrative.

Accurate Role Depiction: The summary clearly identifies Person1 as the host and Paul (Person2) as the guest, which helps clarify their relationship and the nature of the conversation. This distinction is more explicit in SummLlama3-8B compared to other summaries, which might leave these roles more ambiguous.

DISLab
/

SummLlama3-8B

Recommended Prompt for Text Summarization:

Human Evaluation

Autoamted Evaluation using FineSurE

Example

Model tree for DISLab/SummLlama3-8B