Slavic T5 Base

Aim of this model is to reach the best results for the Slavic laguages with Latin script.

It is suitable for tasks such as:

summarization,
extractive question answering,
machine translation between slavic languages in Latin script.

The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.

It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian,

Vocabulary has 120 000 tokens, contains capital letters.

Downloads last month: 14

Safetensors

Model size

383M params

Tensor type

F32

Inference Examples

Text2Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

TUKE-KEMT
/

slavic-t5-base

Slavic T5 Base

Datasets used to train TUKE-KEMT/slavic-t5-base