Terjman-Nano (77M params)

Our model is built upon the powerful Transformer architecture, leveraging state-of-the-art natural language processing techniques. It is a fine-tuned version of Helsinki-NLP/opus-mt-en-ar on a the darija_english dataset enhanced with curated corpora ensuring high-quality and accurate translations.

It achieves the following results on the evaluation set:

  • Loss: 3.2038
  • Bleu: 10.6239
  • Gen Len: 35.2727

Try it out on our dedicated Terjman-Nano Space 🤗

Usage

Using our model for translation is simple and straightforward. You can integrate it into your projects or workflows via the Hugging Face Transformers library. Here's a basic example of how to use the model in Python:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("atlasia/Terjman-Nano")
model = AutoModelForSeq2SeqLM.from_pretrained("atlasia/Terjman-Nano")

# Define your Moroccan Darija Arabizi text
input_text = "Your english text goes here."

# Tokenize the input text
input_tokens = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)

# Perform translation
output_tokens = model.generate(**input_tokens)

# Decode the output tokens
output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

print("Translation:", output_text)

Example

Let's see an example of transliterating Moroccan Darija Arabizi to Arabic:

Input: "Hi my friend, can you tell me a joke in moroccan darija? I'd be happy to hear that from you!"

Output: "مرحبا يا صديقي، يمكن تقال لي نكتة فالداريا المغاربية؟ أنا سَأكُونُ سعيد بسمْاع هادشي منك!"

Limiations

This version has some limitations mainly due to the Tokenizer. We're currently collecting more data with the aim of continous improvements.

Feedback

We're continuously striving to improve our model's performance and usability and we will be improving it incrementaly. If you have any feedback, suggestions, or encounter any issues, please don't hesitate to reach out to us.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
No log 0.9982 140 4.8431 6.4393 31.6253
No log 1.9964 280 3.9077 7.7671 36.1047
No log 2.9947 420 3.6453 8.5008 35.303
4.7676 4.0 561 3.5034 9.293 34.416
4.7676 4.9982 701 3.4161 9.3322 34.5702
4.7676 5.9964 841 3.3582 9.6792 34.438
4.7676 6.9947 981 3.3182 9.8804 35.27
3.7555 8.0 1122 3.2904 10.0802 34.7576
3.7555 8.9982 1262 3.2684 10.2161 34.1873
3.7555 9.9964 1402 3.2534 10.0777 34.6612
3.6059 10.9947 1542 3.2420 10.637 34.6281
3.6059 12.0 1683 3.2325 10.6797 35.1185
3.6059 12.9982 1823 3.2267 10.5413 34.8898
3.6059 13.9964 1963 3.2210 10.6098 35.0
3.5561 14.9947 2103 3.2169 10.4863 34.8567
3.5561 16.0 2244 3.2141 10.6152 34.7328
3.5561 16.9982 2384 3.2119 10.6701 34.8815
3.5363 17.9964 2524 3.2100 10.5632 34.7576
3.5363 18.9947 2664 3.2089 10.5707 34.8623
3.5363 20.0 2805 3.2077 10.6275 34.8678
3.5363 20.9982 2945 3.2066 10.6857 35.0413
3.5299 21.9964 3085 3.2062 10.8112 35.3251
3.5299 22.9947 3225 3.2056 10.6908 34.0413
3.5299 24.0 3366 3.2051 10.5719 35.4298
3.5241 24.9982 3506 3.2046 10.5667 34.9036
3.5241 25.9964 3646 3.2042 10.9389 35.3361
3.5241 26.9947 3786 3.2043 10.5972 34.9532
3.5241 28.0 3927 3.2043 10.6626 35.3113
3.5247 28.9982 4067 3.2042 10.5286 35.0689
3.5247 29.9964 4207 3.2038 10.6298 34.4959
3.5247 30.9947 4347 3.2039 10.5897 34.9449
3.5247 32.0 4488 3.2037 10.7971 35.4711
3.5208 32.9982 4628 3.2039 10.6665 34.8402
3.5208 33.9964 4768 3.2039 10.5543 35.27
3.5208 34.9947 4908 3.2034 10.785 35.022
3.5159 36.0 5049 3.2037 10.6311 34.3388
3.5159 36.9982 5189 3.2037 10.4617 34.3085
3.5159 37.9964 5329 3.2037 10.7629 34.4518
3.5159 38.9947 5469 3.2036 10.6729 35.2066
3.524 39.9287 5600 3.2038 10.6239 35.2727

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
89
Safetensors
Model size
76.4M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for atlasia/Terjman-Nano

Finetuned
(12)
this model

Dataset used to train atlasia/Terjman-Nano

Spaces using atlasia/Terjman-Nano 2

Collection including atlasia/Terjman-Nano