Slavic T5 Base

Aim of this model is to reach the best results for the Slavic laguages with Latin script.

It is suitable for tasks such as:

  • summarization,
  • extractive question answering,
  • machine translation between slavic languages in Latin script.

The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.

It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian,

Vocabulary has 120 000 tokens, contains capital letters.

Downloads last month
14
Safetensors
Model size
383M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train TUKE-KEMT/slavic-t5-base