arxiv:2211.10086

Metadata Might Make Language Models Better

Published on Nov 18, 2022

Authors:

Kaspar Beelen ,

Daniel van Strien

Abstract

This paper discusses the benefits of including metadata when training language models on historical collections. Using 19th-century newspapers as a case study, we extend the time-masking approach proposed by Rosin et al., 2022 and compare different strategies for inserting temporal, political and geographical information into a Masked Language Model. After fine-tuning several DistilBERT on enhanced input data, we provide a systematic evaluation of these models on a set of evaluation tasks: pseudo-perplexity, metadata mask-filling and supervised classification. We find that showing relevant metadata to a language model has a beneficial impact and may even produce more robust and fairer models.

View arXiv page View PDF Add to collection

Community

Paper author Mar 31, 2023

A paper that takes a similar approach. Glad to see more people working on this topic!

Paper author Sep 27, 2023

This comment has been hidden

Kaspar

Paper author Sep 28, 2023

A nice! Thanks for the suggestion :-)

Oct 2, 2023

This is an automated message from the Librarian Bot. I found the following papers similar to the one you just shared:

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

Paper author Jan 25

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2211.10086 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 2