temporary0-0name commited on
Commit
94c1ea9
1 Parent(s): 48c6d23

Upload 6 files

Browse files
README.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ widget:
3
+ - text: Once upon a time,
4
+ example_title: English
5
+ - text: भारत की राजधानी
6
+ example_title: Hindi
7
+ - text: ভারত বৈচিত্র্যের দিকে যাচ্ছিল
8
+ example_title: Bangla
9
+ - text: ભારત વિવિધતા તરફ જઈ રહ્યું હતું
10
+ example_title: Gujrati
11
+ pipeline_tag: text2text-generation
12
+ inference:
13
+ parameters:
14
+ max_new_tokens: 200
15
+ license: apache-2.0
16
+ datasets:
17
+ - soketlabs/bhasha-wiki
18
+ - soketlabs/bhasha-wiki-indic
19
+ - cerebras/SlimPajama-627B
20
+ - ai4bharat/sangraha
21
+ language:
22
+ - hi
23
+ - bn
24
+ - gu
25
+ - en
26
+ tags:
27
+ - indic
28
+ ---
29
+ # Pragna-1b
30
+
31
+ <!-- Provide a quick summary of what the model is/does. -->
32
+
33
+ ![pragna-1b on huggingface](pragna_hf.png)
34
+
35
+
36
+ ## Architecture Overview
37
+
38
+ Pragna-1B is a decoder-only transformer model inspired by TinyLlama, featuring the following specifications:
39
+
40
+ - Layers: 22
41
+ - Attention Heads: 32
42
+ - Context Length: 2048
43
+ - Hidden Dimension: 2048
44
+ - Expansion Dimension: 5632
45
+ - Vocabulary Size: 69632
46
+
47
+ This model incorporates Rotary Positional Encoding to infuse positional information into the embeddings, utilising a base of 10,000. It employs RSNorm with an epsilon value of 1e-5 and the Sigmoid Activation Unit (SiLU) as the activation function. Additionally, Pragna-1B adopts Grouped Query Attention, an alternative to Multi-Head Attention, which enhances training and inference speed while reducing memory bandwidth. This also supports the use of lower-compute devices for inference tasks.
48
+
49
+ Pragna-1B is trained on our proprietary platform, GenAI Studio, a modular AI Developer Platform designed to support any GenAI model architecture. It is capable of scaling across thousands of GPUs or accelerators and is built to be fault-tolerant. The development of this model leveraged Triton, an open-source language from OpenAI, for crafting high-performance custom fused CUDA Kernels for various operations. Furthermore, the model uses Fully Sharded Data Parallel (FSDP) for distributed and parallel training and incorporates the state-of-the-art FlashAttention2 to accelerate training and inference.
50
+
51
+ ### Model Description
52
+
53
+ <!-- Provide a longer summary of what this model is. -->
54
+
55
+ - **Developed by:** [Soket AI Labs](http://soket.ai)
56
+ - **Language(s) (NLP):** Hindi, Bangla, Gujarati and English
57
+ - **License:** Apache 2.0
58
+
59
+ ## Bias, Risks, and Limitations
60
+
61
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
62
+
63
+ [More Information Needed]
64
+
65
+ ## How to Get Started with the Model
66
+
67
+ Use the code below to get started with the model.
68
+ ```python
69
+ from transformers import AutoTokenizer, AutoModelForCausalLM
70
+ tokenizer = AutoTokenizer.from_pretrained("soketlabs/pragna-1b")
71
+ model = AutoModelForCausalLM.from_pretrained(
72
+ "soketlabs/pragna-1b", torch_dtype=torch.bfloat16
73
+ )
74
+ ```
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ 1. [Bhasha-wiki](https://soket.ai/blogs/bhasha_wiki_dataset)
81
+ 2. [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B)
82
+ 3. [Sangraha-Verified](https://huggingface.co/datasets/ai4bharat/sangraha)
83
+
84
+ ### Training Procedure
85
+
86
+ [To be added]
87
+
88
+
89
+ #### Training Hyperparameters
90
+
91
+ - **Precision:** BFloat16
92
+ - **Batch Size:** 2k - 2.5k
93
+ - **Context Length:** 2,048
94
+ - **Learning Rate:** 3e-5
95
+ - **Optimizer:** AdamW
96
+ - **LR Scheduler:** Cosine
97
+ - **Mixed Precision Training**
98
+
99
+ ## Evaluation
100
+
101
+ ### Hindi
102
+ | | Arc-Easy | Arc-Challenge | Hellaswag | Average |
103
+ |--------------|----------|---------------|-----------|---------|
104
+ | pragna-1b | 0.33 | 0.22 | 0.35 | 0.30 |
105
+ | sarvamai/OpenHathi-7B-Hi-v0.1-Base | 0.3582 | 0.2645 | 0.4315 | 0.35 |
106
+ | meta-llama/Llama-2-7b-hf | 0.295 | 0.2406 | 0.3789 | 0.30 |
107
+ | google/gemma-7b | <b>0.5926</b> | <b>0.4258</b> | <b>0.6341</b> | <b>0.55</b> |
108
+ | meta-llama/Meta-Llama-3-8B | 0.5354 | 0.3541 | 0.6072 | 0.50 |
109
+
110
+ ### Gujarati
111
+ | | Arc-Easy | Arc-Challenge | Hellaswag | Average |
112
+ |--------------|----------|---------------|-----------|---------|
113
+ | pragna-1b | 0.32 | 0.22 | 0.37 | 0.30 |
114
+ | google/gemma-7b | <b>0.4954</b> | <b>0.3208</b> | <b>0.5673</b> | <b>0.46</b> |
115
+
116
+
117
+ ### English
118
+
119
+ | | Arc | Hellaswag | MMLU | TruthfulQA | Winogrande | GSM8K | Average |
120
+ |----------------|-----|-----------|------|------------|------------|-------|---------|
121
+ | pragna-1b | 0.3 | 0.51 | 0.27 | 0.38 | 0.56 | 0 | 0.34 |
122
+ | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | 0.36 | 0.61 | 0.25 | 0.37 | 0.61 | 0.02 | 0.37 |
123
+ | meta-llama/Meta-Llama-3-8B-Instruct | <b>0.6</b> | <b>0.82</b> | <b>0.67</b> | <b>0.51</b> | <b>0.77</b> | <b>0.68</b> | <b>0.66</b> |
124
+ | meta-llama/Llama-2-7b-hf | 0.53 | 0.78 | 0.46 | 0.39 | 0.74 | 0.14 | 0.51 |
125
+
126
+
127
+ ### Results
128
+
129
+ | Context | Tokens Predicted |
130
+ |---------|------------------|
131
+ | India is one of the most | India is one of the most exciting places to travel to in Asia. The country hosts a wide variety of cultures and offers a plethora of attractions across the country. There's a lot you can do here and the tourists are always on the lookout for the new things to do and see. If you're looking to travel to India for your next adventure, you'll be pleased to know that you can find a ton of things to do in the country. From the temples to the beaches, there's a lot you can see and do here. One of the best things for visitors to do in India is to visit Jaipur. The capital city of Rajputana, Jaipur is a beautiful city that's known as the "City of Lives". |
132
+ | नरेंद्र मोदी का जन्म | नरेंद्र मोदी का जन्मदिन आज मनाया जा रहा है। इस मौके पर प्रधानमंत्री नरेंद्र मोदी ने ट्वीट कर लिखा कि 'आज देश को एक नया संसद भवन मिला है, जिसे आप लोगों ने बधाई दी है। भारत और भारतवासियों को यह संसद बहुत ही महत्त्वपूर्ण है और इसका उपयोग देश के विकास में बहुत अधिक योगदान देगा। आज हमारे देश में एक संसद बनी हुई है जो हमारे देश की लोकतांत्रिक प्रक्रियाओं को मजबूत बनाती है। |
133
+ | Prime minister Modi | Prime minister Modi's visit to China: PM Modi is scheduled to arrive in Beijing on Thursday. PM Narendra Modi will be the first Indian prime minister to visit China since former President Mao Zedong, who visited India in 1959 and was assassinated by Chinese soldiers a year later. The two leaders are expected to discuss regional security issues such as terrorism, trade and energy cooperation during their talks. |
134
+ | महात्मा गांधी | महात्मा गांधी एक स्वतंत्र व्यक्ति, एक आत्मनिर्भर और एक अत्यंत लोकप्रिय नेता थे। उन्होंने अपने जीवनकाल में देश को एक नई दिशा देने के लिए बहुत कुछ किया। समाज के विभिन्न क्षेत्रों में उनके योगदान के लिए उन्हें सम्मानित किया गया था। |
135
+ | ब्रिटिश राजशाही | ब्रिटिश राजशाही के विघटन के बाद, इंग्लैंड और वेल्स में ब्रिटेन की संसद का गठन हुआ। संसद ने 1807 में अपने पहले अधिनियम, "द क्राउन की विधेयक" (एक्ट ऑफ द डेमोक्रैटिक) को मंजूरी दी, जिसमें प्रत्येक राज्य से एक निर्वाचित सदस्य चुना गया। |
136
+ | মহাত্মা গান্ধী | মহাত্মা গান্ধী ছিলেন আন্তর্জাতিক সমাজতান্ত্রিক আন্দোলনের অন্যতম প্রধান নেতা। তিনি ভারতের স্বাধীনতা সংগ্রামে তাঁর ভূমিকা অপরিসীম। ১৯৪৭ সালের ১৫ আগস্ট ভারতে বিপ্লব ঘটে। ১৯৪৮ সালে মহাত্মা গান্ধীর মৃত্যু হয়। এরপর থেকে মহাত্মা আর দেখা যায় না। |
137
+ | নরেন্দ্র মোদীর জন্ম | নরেন্দ্র মোদীর জন্মদিনের শুভেচ্ছা জানিয়েছেন অমিতাভ বচ্চন। প্রসঙ্গত, গোটা দেশ জুড়ে চলছে নববর্ষ উৎসব। আর এই উৎসবে নিজেকে সামিল করতে চান না কেউই। এমনটাই জানালেন বলিউড অভিনেত্রী সারা আলি খান। |
138
+ | એક વખત | ક વખતે, તેમણે તેમના મિત્ર અને સહાયક વિલિયમ હેનરી મેકડોનાલ્ડ સાથે મળીને "ધ ઓરિજિનલ મેન ઓફ ધ ડેડ" નામના પુસ્તકનું નિર્માણ કર્યું હતું; તે 1961માં પ્રકાશિત થયું હતું. 1958માં, તેમણે ધ ન્યૂ યોર્કર માટે લેખક અને સંપાદક પિતા તરીકે કામ કર્યું હતું. |
139
+ | એક સમયે એક રાજા જંગલમાં રહેતો ���તો. તેમણે | એક સમયે એક રાજા જંગલમાં રહેતો હતો. તેમણે, એક દિવસ રાજાને કહ્યું, "હું તમને મારી જાતને એક સ્ત્રી તરીકે આપું છું." રાજા ખુશ થયો. તેમણે તેને તેના ઘરમાં લઈ ગયા. તે દિવસે રાણી પોતાના પતિ સાથે ઘરે આવી અને કહ્યું કે તે તેની પાસે છે. રાજાએ તેને પૂછ્યું કે શું તેણે તેની સાથે લગ્ન કર્યા છે ? રાજા અને રાણીએ પૂછ્યું - હા. ત્યારે તેમણે કહ્યું કે, "તમે મને તમારી સાથે લગ્ન કરવાનું વચન આપ્યું છે. હવે મારે તમારા પર વિશ્વાસ રાખવો જોઈએ નહીં". |
140
+
141
+
142
+ ## Citation
143
+
144
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
145
+
146
+ ```
147
+ @ONLINE{bhasha-wiki,
148
+ author = "Soket Labs Technology and Research Private Limited",
149
+ title = "pragna-1b",
150
+ url = "https://soket.ai"
151
+ }
152
+ ```
153
+
154
+ ## Model Card Contact
155
+
156
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "soketlabs/pragna-1b",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5632,
14
+ "max_position_embeddings": 2048,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 22,
18
+ "num_key_value_heads": 4,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_scaling": null,
22
+ "rope_theta": 10000.0,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.36.2",
26
+ "use_cache": true,
27
+ "vocab_size": 67991
28
+ }
generation_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "max_length": 2048,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.36.2",
7
+ "do_sample": "True",
8
+ "top_k": 10,
9
+ "temperature": 0.8,
10
+ "max_new_tokens": 512
11
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "bos_token": "<s>",
31
+ "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "legacy": false,
35
+ "model_max_length": 1000000000000000019884624838656,
36
+ "pad_token": "</s>",
37
+ "padding_side": "right",
38
+ "sp_model_kwargs": {},
39
+ "tokenizer_class": "LlamaTokenizer",
40
+ "unk_token": "<unk>",
41
+ "use_default_system_prompt": false
42
+ }