bhavitvyamalik/en-mt-v1.0

en-mt HPLT v1.0

Note: This repository only contains the model weights. For usage instructions, evaluation scripts, and inference scripts, please refer to the HPLT-MT-Models v1.0 GitHub repository.

source language: en
target language: mt
dataset: OPUS + HPLTDatasets v1.2
model: transformer-base
tokenizer: SentencePiece (Unigram)
cleaning: We use OpusCleaner for cleaning the corpus. Details about rules used can be found in the filter files in Github

Benchmarks

testset	BLEU	chr-F	comet
flores200.en.mt	47.5	0.64	0.64
ntrex.en.mt	25	0.62	0.62