README.md · gyanai/biswabangla-356M at main

metadata

license: cc-by-nc-sa-4.0
language:
  - bn

Description

Biswabangla is a 356 million parameters open source Generative pretrained Language Model for Bangla/Bengali.

Biswabangla is a monolingual Bangla/Bengali Language model.

This is a pretrained model from scratch at a context size of 4096.

This model is not either chat-tuned or fine-tuned.

We recommend to fine-tune/chat-tune this pretrained model on Bangla/Bengali chat or Bangla NLP datasets.

We also recommend to perform continual pretraining before fine-tuning.

This model is strictly prohibited to use for commercial purposes.

If you use our model, please cite our paper Niyogi and Bhattacharya, 2024

The architecture of Biswabangla is different than the language models, mentioned in Niyogi and Bhattacharya, 2024

Model Architecture

Transformer Decoder Auto Regressive Model

Limitations

The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

Gyan AI Research does own the output generated from the model.

Citations

 @misc{niyogi2024paramanu,
      title={Paramanu: A Family of Novel Efficient Indic Generative Foundation Language Models}, 
      author={Mitodru Niyogi and Arnab Bhattacharya},
      year={2024},
      eprint={2401.18034},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}