Model Card for Model ID

Model Details

To fine-tune Llama 3.1 for improved support of the Arabic language, I will utilize a dataset consisting of Arabic conversations.

Fine-tuning large language models (LLMs) like Llama 3.1 on a dataset containing text in a new language, such as Arabic, enhances their ability to understand, generate, and effectively use that language. This process allows the model to learn the nuances, grammar, vocabulary, and cultural context specific to Arabic. Consequently, it becomes more proficient in producing coherent and contextually relevant text in Arabic, thus expanding its multilingual capabilities.

Model Description

Llama3.1_8k
context window 128k
Developed by: [Alber Bshara]
Language(s) (NLP): [Arabic (Ar), English (En)]
License: [NeptoneAI]
Finetuned from model: [Fine-tuned from LLaMA3.1_8k model]

Model Sources [optional]

Core Model: [https://ai.meta.com/blog/meta-llama-3-1/]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

How to Get Started with the Model

To use this model, please scroll to the bottom of this page to see instance usage examples.

Training Details

Training Data

https://huggingface.co./M-A-D#:~:text=The%20Mixed%20Arabic%20Datasets%20(MAD,language%20datasets%20across%20the%20Internet.

Training hyperparameters

The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 2 - eval_batch_size: 8 - seed: 3407 - gradient_accumulation_steps: 4 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 5 - mixed_precision_training: Native AMP

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

''' question = "كيف يمكنك إيجاد القاسم المشترك الأكبر للعددين 81 و 27؟"

ans=

يمكننا استخدام "خوارزمية إقليدس" لإيجاد القاسم المشترك الأكبر للعددين 81 و 27. هذه الخوارزمية تعتمد على القسمة المتكررة، حيث نقسم العدد الأكبر على العدد الأصغر، ثم نقسم العدد الأصغر على الباقي من القسمة السابقة، ونجري هذه العملية حتى يكون الباقي صفرًا.

الخطوة 1: نقسم 81 على 27: 81 ÷ 27 = 3 والباقي R = 0.
النتيجة النهائية: عندما يكون الباقي صفرًا، فإن العدد الذي تم القسمة عليه في تلك الخطوة هو القاسم المشترك الأكبر (GCD). في هذه الحالة، عندما أصبح الباقي صفرًا، كان العدد المقسوم عليه هو 27، وبالتالي GCD لـ 81 و27 هو 27. '''

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

can run on the T4, L4 GPU or other powerfull GPUs.

Software

Framework versions

PEFT 0.12.0
Transformers 4.44.2
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.19.1

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

How to Use it:

import sys, os
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from unsloth.chat_templates import get_chat_template
from typing import Tuple, Dict, Any, List
import torch

class LLM:
    def __init__(self, load_in_4bit: bool = True,
                 load_cpu_mem_usage: bool = True,
                 hf_model_path: str = "AlberBshara/ar_llama3.1",
                 max_new_tokens: int= 4096):
        """
        Args:
            load_in_4bit (bool): Use 4-bit quantization. Defaults to True.
            load_cpu_mem_usage (bool): Reduce CPU memory usage. Defaults to True.
            hf_model_path (str): The path of your model on HuggingFace-Hub like "your-user-name/model-name".
        """
        assert torch.cuda.is_available(), "CUDA is not available. An NVIDIA GPU is required."
        hf_auth_token = HUGGING_FACE_API_TOKEN
        # Specify the quantization config
        self._bnb_config = BitsAndBytesConfig(load_in_4bit=load_in_4bit)

        # Load model directly with quantization config
        self.model = AutoModelForCausalLM.from_pretrained(
            hf_model_path,
            low_cpu_mem_usage=load_cpu_mem_usage,
            quantization_config=self._bnb_config,
            use_auth_token=hf_auth_token
        )

        # Load the tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(
            hf_model_path,
            use_auth_token=hf_auth_token
        )
        self.__tokenizer = get_chat_template(
            self.tokenizer,
            chat_template="llama-3",
            mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},
        )

        self._hf_model_path = hf_model_path
        self._EOS_TOKEN_ID = self.__tokenizer.eos_token_id
        self.max_new_tokens = max_new_tokens

        self._prompt = lambda context, question: f"""
        Please provide a detailed answer to the question using only the information provided in the context. Do not include any information that is not explicitly mentioned in the context.

        Context: [{context}]

        - If the context is in Arabic, answer in Arabic; otherwise, answer in English.

        Question: [{question}]

        Your answer should be comprehensive, thoroughly explaining the topic while staying within the boundaries of the provided context.
        """

    def invoke(self, context: str, question: str) -> Tuple:
        if not question.strip():
            raise ValueError("question cannot be empty or None")

        if not context.strip():
            raise ValueError("context cannot be empty or None")

        inputs = self._prompt(context, question)

        messages = [{"from": "human", "value": inputs}]
        inputs = self.__tokenizer.apply_chat_template(
              messages,
              tokenize=True,
              add_generation_prompt=True, # Must add for generation
              return_tensors="pt",
        ).to("cuda")
        
        # Increase the max_new_tokens to allow more detailed responses
        output_ids = self.model.generate(inputs, max_new_tokens=self.max_new_tokens, pad_token_id=self.__tokenizer.pad_token_id)
        output_ids = output_ids.tolist()[0] if output_ids.size(0) == 1 else output_ids.tolist()

        output_text = self.__tokenizer.decode(output_ids, skip_special_tokens=True)

        # Caching GPU Mem.
        del inputs
        del output_ids
        torch.cuda.empty_cache()

        return output_text, messages

    def extract_answer(self, response: str) -> str:
        start_with: str = ".assistant"
        start_index = response.find(start_with)

        # If the word is found, extract the substring from that point onward
        if start_index != -1:
            # Move start_index to the end of the word
            start_index += len(start_with)
            return response[start_index:]
        else:
            return response

    def get_metadata(self) -> Dict[str, Any]:
        return {
            "class_name": self.__class__.__name__,
            "init_params": {
                "load_in_4bit": True,
                "load_cpu_mem_usage": True,
                "hf_model_path": "AlberBshara/ar_llama3.1",
                "hf_auth_token": "--%$%--",
                 "max_new_tokens": self.max_new_tokens
            },
            "methods": ["invoke", "extract_answer"]
        }


llm = LLM()

AlberBshara
/

ar_llama3.1

Model Card for Model ID

Model Details

Model Description

Model Sources [optional]

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination [optional]

Environmental Impact

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Framework versions

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

How to Use it:

Model tree for AlberBshara/ar_llama3.1

Dataset used to train AlberBshara/ar_llama3.1