中文版本

About TexTeller

  • 📮[2024-03-25] TexTeller 2.0 released! The training data for TexTeller 2.0 has been increased to 7.5M (about 15 times more than TexTeller 1.0 and also improved in data quality). The trained TexTeller 2.0 demonstrated superior performance in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.

    There are more test images here and a horizontal comparison of recognition models from different companies.

TexTeller is a ViT-based model designed for end-to-end formula recognition. It can recognize formulas in natural images and convert them into LaTeX-style formulas.

TexTeller is trained on a larger dataset of image-formula pairs (a 550K dataset available here), exhibits superior generalization ability and higher accuracy compared to LaTeX-OCR, which uses approximately 100K data points. This larger dataset enables TexTeller to cover most usage scenarios more effectively.

For more details, please refer to the 𝐓𝐞𝐱𝐓𝐞𝐥𝐥𝐞𝐫 GitHub repository.

Downloads last month
4,075
Safetensors
Model size
298M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train OleehyO/TexTeller

Spaces using OleehyO/TexTeller 2