BERTislav

Baseline fill-mask model based on ruBERT and fine-tuned on a 10M-word corpus of mixed Old Church Slavonic, (Later) Church Slavonic, Old East Slavic, Middle Russian, and Medieval Serbian texts.

Overview

  • Model Name: BERTislav
  • Task: Fill-mask
  • Base Model: ai-forever/ruBert-base
  • Languages: orv (Old East Slavic, Middle Russian), cu (Old Church Slavonic, Church Slavonic)
  • Developed by: Nilo Pedrazzini

Input Format

A str-type input with [MASK]ed tokens.

Output Format

The predicted token, with the confidence score for each labels.

Examples

Example 1:

COMING SOON

Uses

The model can be used as a baseline model for further finetuning to perform specific downstream tasks (e.g. linguistic annotation).

Bias, Risks, and Limitations

The model should only be considered a baseline, and should not be evaluated on its own. Testing is needed regarding its usefulness to improve the performance of language models finetuned for specific tasks.

Training Details

The texts used as training data are from the following sources:

NB: Texts were heavily normalized and anyone planning to use the model is advised to do the same for the best outcome. Use the provided normalization script, customizing it as needed.

Model Card Authors

Nilo Pedrazzini

Model Card Contact

[email protected]

How to use the model

COMING SOON

Downloads last month
57
Safetensors
Model size
178M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.