aashish1904
commited on
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
|
4 |
+
license: mit
|
5 |
+
datasets:
|
6 |
+
- sinarashidi/alpaca-persian
|
7 |
+
language:
|
8 |
+
- en
|
9 |
+
- fa
|
10 |
+
library_name: transformers
|
11 |
+
|
12 |
+
---
|
13 |
+
|
14 |
+
[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
|
15 |
+
|
16 |
+
|
17 |
+
# QuantFactory/Maral-7B-alpha-1-GGUF
|
18 |
+
This is quantized version of [MaralGPT/Maral-7B-alpha-1](https://huggingface.co/MaralGPT/Maral-7B-alpha-1) created using llama.cpp
|
19 |
+
|
20 |
+
# Original Model Card
|
21 |
+
|
22 |
+
|
23 |
+
# Maral 7B Alpha 1
|
24 |
+
|
25 |
+
<p align="center">
|
26 |
+
<img src="maral-7b-announce.png" width=256 height=256 />
|
27 |
+
</p>
|
28 |
+
|
29 |
+
## What is Maral?
|
30 |
+
|
31 |
+
_Maral_ is just a new large lanugage model, specializing on the Persian language. This model is based on [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) and trained an _Alpaca Persian_ dataset. This model is one of the few efforts in Persian speaking scene in order to bring our language to a new life in the era of AI.
|
32 |
+
|
33 |
+
Also, since Maral is based on Mistral, it's capable of producing English answers as well.
|
34 |
+
|
35 |
+
### What does "Maral" mean?
|
36 |
+
|
37 |
+
Maral is the Persian name of [Red Deer](https://en.wikipedia.org/wiki/Red_deer), which is a native species of deers in Iran. The name has chosen for quite a few reasons, one of them is that the environmental concerns we have and second, since it's a Persian LLM, made by Iranian people, it deserves an Iranian name.
|
38 |
+
|
39 |
+
## Inference
|
40 |
+
|
41 |
+
### Prompt Format
|
42 |
+
|
43 |
+
This model requires _Guanaco_ format, which is like this:
|
44 |
+
|
45 |
+
```
|
46 |
+
### Human: <prompt>
|
47 |
+
### Assistant: <answer>
|
48 |
+
```
|
49 |
+
|
50 |
+
So in your code, you may write prompts like this:
|
51 |
+
|
52 |
+
```python
|
53 |
+
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
|
54 |
+
prompt = f"### Human:{prompt}\n### Assistant:"
|
55 |
+
```
|
56 |
+
|
57 |
+
More information about this on the inference sections.
|
58 |
+
|
59 |
+
### 4 bit Quantization
|
60 |
+
|
61 |
+
If you want to use 4 bit quantization, we have a PEFT for you [here](https://huggingface.co/MaralGPT/MaralGPT-Mistral-7B-v-0-1). Also, you can find _Google Colab_ notebooks [here](https://github.com/prp-e/maralgpt).
|
62 |
+
|
63 |
+
### Installing Libraries
|
64 |
+
|
65 |
+
```pip install transformers accelerate bitsandbytes```
|
66 |
+
|
67 |
+
_NOTE_: `bitsandbytes` library is only needed for 8 bit version. Otherwise, it's not necessary.
|
68 |
+
|
69 |
+
### Inference on a big GPU
|
70 |
+
|
71 |
+
If you have a big enough GPU like an A100 in your posession, this code is for you.
|
72 |
+
|
73 |
+
```python
|
74 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
|
75 |
+
import torch
|
76 |
+
|
77 |
+
model_name_or_id = "MaralGPT/Maral-7B-alpha-1"
|
78 |
+
|
79 |
+
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.bfloat16, device_map="auto")
|
80 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
|
81 |
+
|
82 |
+
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
|
83 |
+
prompt = f"### Human:{prompt}\n### Assistant:"
|
84 |
+
|
85 |
+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
86 |
+
|
87 |
+
generation_config = GenerationConfig(
|
88 |
+
do_sample=True,
|
89 |
+
top_k=1,
|
90 |
+
temperature=0.5,
|
91 |
+
max_new_tokens=300,
|
92 |
+
pad_token_id=tokenizer.eos_token_id
|
93 |
+
)
|
94 |
+
|
95 |
+
outputs = model.generate(**inputs, generation_config=generation_config)
|
96 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
97 |
+
```
|
98 |
+
|
99 |
+
### Inference on a small GPU (Consumer Hardware/Free Colab)
|
100 |
+
|
101 |
+
The code is pretty much the same as above, but with a slight diferrence.
|
102 |
+
|
103 |
+
* Make sure `bitsandbytes` is installed correctly.
|
104 |
+
* Your model loading must be `model = AutoModelForCausalLM.from_pretrained(model_name_or_id, load_in_8bit=True, torch_dtype=torch.bfloat16, device_map="auto")`
|
105 |
+
|
106 |
+
On _free version_ of Google Colab, you may face RAM problems. I guess using `low_cpu_mem_usage=True` in model loading would help.
|
107 |
+
|
108 |
+
## Known Issues
|
109 |
+
|
110 |
+
* The model produces GPT-3.5 level answers in terms of grammar (specially Persian) but is capable of extremely insane hallucinations. This problem can be solved by a better dataset and better training procedures (such as DPO).
|
111 |
+
* According to the previous issue, the model can also generate misinforming answers specially when dealing with _reasoning_ problems in Persian.
|
112 |
+
* The model is huge, so it requires a lot of resources in order to work correctly. However, we may provide _GPTQ_ or _GGUF_ versions as well.
|
113 |
+
* The prompt format works and it proves our concept of a _instruct following_ LLM, but since we haven't changed `eos_token` and `bos_token` to our own, you may see unncessary information being generated by the model.
|
114 |
+
* According to the previous issue, the model is capable of repeating itself. To solve this problem _temporarily_ you have to keep temperature below 1. According to our tests somewhere between 0.5 to 0.7 is a sweet spot.
|
115 |
+
|
116 |
+
## Our Team
|
117 |
+
|
118 |
+
* Muhammadreza Haghiri ([Website](https://haghiri75.com/en) - [Github](https://github.com/prp-e) - [LinkedIn](https://www.linkedin.com/in/muhammadreza-haghiri-1761325b))
|
119 |
+
* Mahi Mohrechi ([Website](https://mohrechi-portfolio.vercel.app/) - [Github](https://github.com/f-mohrechi) - [LinkedIn](https://www.linkedin.com/in/faeze-mohrechi/))
|
120 |
+
|
121 |
+
## Special Thanks
|
122 |
+
|
123 |
+
* Mistral Team for providing the best open source base model ever.
|
124 |
+
* _Sina Rashidi_, who translated Alpaca dataset to Persian.
|
125 |
+
* [Jupyto](https://jupyto.com) team for providing our infrastructure.
|
126 |
+
|