Faseeh / README.md
Abdulmohsena's picture
Update README.md
0322664 verified
|
raw
history blame
2.55 kB
metadata
datasets:
  - ImruQays/Quran-Classical-Arabic-English-Parallel-texts
language:
  - ar
library_name: transformers
license: mit
metrics:
  - bertscore
pipeline_tag: translation

فصيح

نموذج لغوي مصمم للترجمة إلى لسان عربي فصيح، لأن السائد حاليا في الترجمة هي العربية المستحدثة (العرنجية)

وما هي العرنجية؟

لغة ظاهرها العربية، وباطنها الأفرنجية. وأمثلة هذا كثيرة، ومن ذلك: نمط حياة بدلًا من معيشة، وأرضية مشتركة بدلًا من كلمة سواء، وسلام داخلي بدلًا من طمأنينة أو سكينة، وسلبيات وإيجابيات بدلًا من محاسن الشيء ومساويه، وفضائله ورذائله.


Faseeh

A MTM (Machine Translation Model) designed to translate to True Classical Arabic

How to Get Started with the Model

Use the code below to get started with the model.

model_name = "Abdulmohsena/Faseeh"

tokenizer = AutoTokenizer.from_pretrained(model_name, src_lang="eng_Latn", tgt_lang="arb_Arab")
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
generation_config = GenerationConfig.from_pretrained(model_name)


dummy = "And the Saudi Arabian Foreign Minister assured the visitors of the importance to seek the security."

encoded_ar = tokenizer(dummy, return_tensors="pt")
generated_tokens = model.generate(**encoded_ar, generation_config=generation_config)

tokenizer.decode(generated_tokens[0], skip_special_tokens=True)

Model Details

  • Finetuned version of facebook's NLLB 200 Distilled 600M Parameters

Model Sources

Bias, Risks, and Limitations

  • The language pairs are outside of quran is mostly translated by Google Translate. Thus, the quality of translation is dependant on the quality of Google's Translation from Classical Arabic to English.
  • The Metrics used in this model is bertscore/e5score. It is not even close to perfect in terms of alignment, but it is the best available metric for semantic translation. Thus, until a better subsitute appears, this is the main evaluation metric.

Training Data

  • Arabic text outside of HuggingFace datasets are scraped from Shamela Library

Metrics

  • bertscore: to pay more attention in representing the same meaning rather than focusing on individual words (Semantic Translation, not Syntactic Translation)