Papers
arxiv:2211.10086

Metadata Might Make Language Models Better

Published on Nov 18, 2022

Abstract

This paper discusses the benefits of including metadata when training language models on historical collections. Using 19th-century newspapers as a case study, we extend the time-masking approach proposed by Rosin et al., 2022 and compare different strategies for inserting temporal, political and geographical information into a Masked Language Model. After fine-tuning several DistilBERT on enhanced input data, we provide a systematic evaluation of these models on a set of evaluation tasks: pseudo-perplexity, metadata mask-filling and supervised classification. We find that showing relevant metadata to a language model has a beneficial impact and may even produce more robust and fairer models.

Community

Paper author

A paper that takes a similar approach. Glad to see more people working on this topic!

Paper author
This comment has been hidden
Paper author

A nice! Thanks for the suggestion :-)

This is an automated message from the Librarian Bot. I found the following papers similar to the one you just shared:

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

Paper author
This comment has been hidden

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2211.10086 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 2