Marathi DistilBERT

Model description

This model is an adaptation of DistilBERT (Victor Sanh et al., 2019) for Marathi language. This version of Marathi-DistilBERT is trained from scratch on approximately 11.2 million sentences.

DISCLAIMER

This model has not been thoroughly tested and may contain biased opinions or inappropriate language. User discretion is advised

Training data

The training data has been extracted from a variety of sources, mainly including:

Oscar Corpus
Marathi Newspapers
Marathi storybooks and articles

The data is cleaned by removing all languages other than Marathi, while preserving common punctuations

Training procedure

The model is trained from scratch using an Adam optimizer with a learning rate of 1e-4 and default β1 and β2 values of 0.9 and 0.999 respectively with a total batch size of 256 on a v3-8 TPU and mask probability of 15%.

Example

from transformers import pipeline
fill_mask = pipeline(
    "fill-mask",
    model="DarshanDeshpande/marathi-distilbert",
    tokenizer="DarshanDeshpande/marathi-distilbert",
)
fill_mask("हा खरोखर चांगला [MASK] आहे.")

BibTeX entry and citation info

@misc{sanh2020distilbert,
      title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, 
      author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
      year={2020},
      eprint={1910.01108},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

DarshanDeshpande
/

marathi-distilbert

Marathi DistilBERT

Model description

Training data

Training procedure

Example

BibTeX entry and citation info

Authors

1. Darshan Deshpande: GitHub, LinkedIn

2. Harshavardhan Abichandani: GitHub, LinkedIn