duyntnet
/

Sailor2-20B-Chat-imatrix-GGUF

+---
+license: other
+language:
+- en
+pipeline_tag: text-generation
+inference: false
+tags:
+- transformers
+- gguf
+- imatrix
+- Sailor2-20B-Chat
+---
+Quantizations of https://huggingface.co/sail/Sailor2-20B-Chat
+### Inference Clients/UIs
+* [llama.cpp](https://github.com/ggerganov/llama.cpp)
+* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
+* [ollama](https://github.com/ollama/ollama)
+* [jan](https://github.com/janhq/jan)
+* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
+* [GPT4All](https://github.com/nomic-ai/gpt4all)
+---
+# From original readme
+Sailor2 is a community-driven initiative that brings cutting-edge multilingual language models to South-East Asia (SEA).
+Our research highlights a strong demand for models in the **8B and 20B parameter** range for production use, alongside **1B models** for specialized applications,
+such as speculative decoding and research purposes.
+These models, released under the **Apache 2.0 license**, provide enhanced accessibility to advanced language technologies across the region.
+Sailor2 builds upon the foundation of the awesome multilingual model [Qwen 2.5](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) and
+is continuously pre-trained on **500B tokens** to support **15 languages** better with a unified model.
+These languages include English, Chinese, Burmese, Cebuano, Ilocano, Indonesian, Javanese, Khmer, Lao, Malay, Sundanese, Tagalog, Thai, Vietnamese, and Waray.
+By addressing the growing demand for diverse, robust, and accessible language models, Sailor2 seeks to serve the underserved in SEA areas with open, inclusive, and accessible multilingual LLMs.
+The Sailor2 model comes in three sizes, 1B, 8B, and 20B, which are **expanded from the Qwen2.5 base models** of 0.5B, 7B, and 14B, respectively.
+## Requirements
+The code of Sailor2 has been in the latest Hugging face transformers and we advise you to install `transformers==4.46.3`.
+## Quickstart
+Here provides a code snippet to show you how to load the tokenizer and model and how to generate contents.
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+device = "cuda"
+model = AutoModelForCausalLM.from_pretrained(
+    'sail/Sailor2-20B-Chat',
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained('sail/Sailor2-20B-Chat')
+system_prompt= \
+'You are an AI assistant named Sailor2, created by Sea AI Lab. \
+As an AI assistant, you can answer questions in English, Chinese, and Southeast Asian languages \
+such as Burmese, Cebuano, Ilocano, Indonesian, Javanese, Khmer, Lao, Malay, Sundanese, Tagalog, Thai, Vietnamese, and Waray. \
+Your responses should be friendly, unbiased, informative, detailed, and faithful.'
+prompt = "Beri saya pengenalan singkat tentang model bahasa besar."
+# prompt = "Hãy cho tôi một giới thiệu ngắn gọn về mô hình ngôn ngữ lớn."
+# prompt = "ให้ฉันแนะนำสั้น ๆ เกี่ยวกับโมเดลภาษาขนาดใหญ่"
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(device)
+input_ids = model_inputs.input_ids.to(device)
+generated_ids = model.generate(
+    input_ids,
+    max_new_tokens=512,
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```