--- language: - en - hi - mr - gu - ta - ml license: llama2 tags: - multilingual - instruction-tuning - llama2 --- # RomanSetu This was trained as part of the paper [RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models via Romanization](https://arxiv.org/abs/2401.14280). The codebase used to train and evaluate this model can be found at [https://github.com/AI4Bharat/romansetu](https://github.com/AI4Bharat/romansetu). ## Usage ```python3 from transformers import AutoTokenizer, AutoModelForCausalLM model_path = "ai4bharat/romansetu-cpt-native-200m" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path) ```