Introduction

This model is a fine-tuned version of xlm-roberta-large for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).

Usage

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("es_trf_ner_cds_xlm-large")
model = AutoModelForTokenClassification.from_pretrained("es_trf_ner_cds_xlm-large")

example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. Si te metes en el Franco desde la Alameda, vas hacia la Catedral. Y allí precisamente es Santiago el patrón del pueblo."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")

for ent in ner_pipe(example):
    print(ent)

Dataset

ToDo

Model performance

entity precision recall f1
LOC 0.973 0.983 0.978
MISC 0.760 0.788 0.773
ORG 0.885 0.701 0.783
PER 0.937 0.878 0.906
micro avg 0.953 0.958 0.955
macro avg 0.889 0.838 0.860
weighted avg 0.953 0.958 0.955

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Framework versions

  • Transformers 4.28.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
23
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.