long-t5-tglobal-base-sci-simplify

Open In Colab

Exploring how well long-document models trained on "lay summaries" of scientific papers generalize.

A lay summary is a summary of a research paper or scientific study that is written in plain language, without the use of technical jargon, and is designed to be easily understood by non-experts.

Model description

This model is a fine-tuned version of google/long-t5-tglobal-base on the pszemraj/scientific_lay_summarisation-plos-norm dataset for two epochs.

  • The variant trained on the ELIFE subset can be found here

Usage

It's recommended to use this model with beam search decoding. If you are interested, you can also use the textsum util repo to have most of this abstracted for you:

Install with pip:

pip install -U textsum

Use in python:

from textsum.summarize import Summarizer

summarizer = Summarizer('pszemraj/long-t5-tglobal-base-sci-simplify')
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)

Intended uses & limitations

  • Ability to generalize outside of the dataset domain (pubmed/bioscience type papers) has to be evaluated.

Training procedure

Eval results

It achieves the following results on the evaluation set:

  • Loss: 1.6778
  • Rouge1: 49.1475
  • Rouge2: 18.9281
  • Rougel: 26.9893
  • Rougelsum: 45.0973
  • Gen Len: 399.4125

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.966 0.52 200 1.7171 48.6521 18.427 26.7726 44.3947 376.335
1.877 1.03 400 1.6909 49.3263 18.7945 27.0741 45.1737 382.205
1.9007 1.55 600 1.6778 49.1475 18.9281 26.9893 45.0973 399.4125
Downloads last month
176
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/long-t5-tglobal-base-sci-simplify

Quantized
(3)
this model
Finetunes
1 model

Dataset used to train pszemraj/long-t5-tglobal-base-sci-simplify

Spaces using pszemraj/long-t5-tglobal-base-sci-simplify 8