Edit model card

Finetuned version of hmByT5 on DE1, DE2, DE3 and DE7 parts of the IDCAR2019-POCR dataset to correct OCR mistakes. The max_length was set to 350.

Performance

SacreBLEU eval dataset: 10.83 
SacreBLEU eval model: 72.35

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

example_sentence = "Anvpreiſungq. Haupidepot für Wien: In der Stadt, obere Bräunerſtraße Nr. 1137 in der Varfüͤmerie-Handlung zur"

tokenizer = AutoTokenizer.from_pretrained("Var3n/hmByT5_anno")
model = AutoModelForSeq2SeqLM.from_pretrained("Var3n/hmByT5_anno")

input = tokenizer(example_sentence, return_tensors="pt").input_ids
output = model.generate(input, max_new_tokens=len(input[0]), num_beams=4, do_sample=True)

text = tokenizer.decode(output[0], skip_special_tokens=True)
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.