PAWA: Swahili SML for Various Tasks


Overview

PAWA is a Swahili-specialized language model designed to excel in tasks requiring nuanced understanding and interaction in Swahili and English. It leverages supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for improved performance and consistency. Below are the detailed model specifications, installation steps, usage examples, and its intended applications.


Model Details

  • Model Name: Pawa-mini-V0.1
  • Model Type: PAWA
  • Architecture:
    • 2B Parameter Gemma-2 Base Model
    • Enhanced with Swahili SFT and DPO datasets.
  • Languages Supported:
    • Swahili
    • English
    • Custom tokenizer for multi-language flexibility.
  • Primary Use Cases:
    • Contextually rich Swahili-focused tasks.
    • General assistance and chat-based interactions.
  • License: Custom/Contact Author for terms of use.

Installation and Setup

Ensure the necessary libraries are installed and up-to-date:

!pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install datasets

Model Loading

You can load the model using the following code snippet:

from unsloth import FastLanguageModel
import torch

model_name = "sartifyllc/Pawa-mini-V0.1"
max_seq_length = 2048  
dtype = None  
load_in_4bit = False  

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

Chat Template Configuration

For a seamless conversational experience, configure the tokenizer with the appropriate chat template:

from unsloth.chat_templates import get_chat_template
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

tokenizer = get_chat_template(
    tokenizer,
    chat_template="chatml",  # Supports templates like zephyr, chatml, mistral, etc.
    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},  # ShareGPT style
    map_eos_token=True,  # Maps <|im_end|> to </s>
)

Usage Example

Generate a short story in Swahili:

messages = [{"from": "human", "value": "Tengeneza hadithi fupi"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True)

Training and Fine-Tuning Details

  • Base Model: Gemma-2-2B
  • Continue Pre-Training: 3B Swahili Tokens
  • Fine-tuning: Enhanced with Swahili SFT datasets for improved contextual understanding.
  • Optimization: Includes DPO for deterministic and consistent responses.

Intended Use Cases

  • General Assistance:
    Provides structured answers for general-purpose use.

  • Interactive Q&A:
    Designed for general-purpose chat environments.

  • RAG (Retrieval-Augmented Generation):
    Works best for RAG and specific use cases.


Limitations

  • Biases:
    The model may exhibit biases inherent in its fine-tuning datasets.

  • Generalization:
    May struggle with tasks outside the trained domain.

  • Hardware Requirements:

    • Optimal performance requires GPUs with high memory (e.g., Tesla V100 or T4).
    • Supports 4-bit quantization for reduced memory usage.

Feel free to reach out for further guidance or collaboration opportunities regarding PAWA!

Downloads last month
2
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for sartifyllc/Pawa-mini-V0.1

Base model

google/gemma-2-2b
Finetuned
(474)
this model