im2latex
This model is a base VisionEncoderDecoderModel fine-tuned on a dataset for generating LaTeX formulas from images.
Model Details
- Encoder: Swin Transformer
- Decoder: GPT-2
- Framework: PyTorch
- DDP (Distributed Data Parallel): Used for training
Training Data
The data is taken from OleehyO/latex-formulas. The data was divided into 80:10:10 for train, val and test. The splits were made as follows:
dataset = load_dataset(OleehyO/latex-formulas, cleaned_formulas)
train_val_split = dataset["train"].train_test_split(test_size=0.2, seed=42)
train_ds = train_val_split["train"]
val_test_split = train_val_split["test"].train_test_split(test_size=0.5, seed=42)
val_ds = val_test_split["train"]
test_ds = val_test_split["test"]
Evaluation Metrics
The model was evaluated on a test set with the following results:
- Test Loss: 0.10
- Test BLEU Score: 0.67
Usage
You can use the model directly with the transformers
library:
from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image
# load model, tokenizer, and feature extractor
model = VisionEncoderDecoderModel.from_pretrained("DGurgurov/im2latex")
tokenizer = AutoTokenizer.from_pretrained("DGurgurov/im2latex")
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/swin-base-patch4-window7-224-in22k") # using the original feature extractor for now
# prepare an image
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
# generate LaTeX formula
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Generated LaTeX formula:", generated_texts[0])
Training Script
The training script for this model can be found in the following repository: GitHub
Citation:
- If you use this work in your research, please cite our paper:
@misc{gurgurov2024imagetolatexconvertermathematicalformulas,
title={Image-to-LaTeX Converter for Mathematical Formulas and Text},
author={Daniil Gurgurov and Aleksey Morshnev},
year={2024},
eprint={2408.04015},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.04015},
}
License [MIT]
- Downloads last month
- 393
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.