philschmid/habana-xlm-r-large-amazon-massive

This model is a fine-tuned version of xlm-roberta-large on the AmazonScience/massive dataset. It achieves the following results on the evaluation set:

8x HPU approx. 41min

train results

{'loss': 0.2651, 'learning_rate': 2.4e-05, 'epoch': 1.0}
{'loss': 0.1079, 'learning_rate': 1.8e-05, 'epoch': 2.0}
{'loss': 0.0563, 'learning_rate': 1.2e-05, 'epoch': 3.0}
{'loss': 0.0308, 'learning_rate': 6e-06, 'epoch': 4.0}
{'loss': 0.0165, 'learning_rate': 0.0, 'epoch': 5.0}

total

{'train_runtime': 3172.4502, 'train_samples_per_second': 127.028, 'train_steps_per_second': 1.986, 'train_loss': 0.09531746031746031, 'epoch': 5.0}

eval results

{'eval_loss': 0.3128528892993927, 'eval_accuracy': 0.9125852013210597, 'eval_f1': 0.9125852013210597, 'eval_runtime': 45.1795, 'eval_samples_per_second': 314.988, 'eval_steps_per_second': 4.936, 'epoch': 1.0}
{'eval_loss': 0.36222779750823975, 'eval_accuracy': 0.9134987000210807, 'eval_f1': 0.9134987000210807, 'eval_runtime': 29.8241, 'eval_samples_per_second': 477.165, 'eval_steps_per_second': 7.477, 'epoch': 2.0}
{'eval_loss': 0.3943144679069519, 'eval_accuracy': 0.9140608530672476, 'eval_f1': 0.9140
608530672476, 'eval_runtime': 30.1085, 'eval_samples_per_second': 472.657, 'eval_steps_per_second': 7.407, 'epoch': 3.0}
{'eval_loss': 0.40938863158226013, 'eval_accuracy': 0.9158878504672897, 'eval_f1': 0.9158878504672897, 'eval_runtime': 30.4546, 'eval_samples_per_second': 467.286, 'eval_steps_per_second': 7.322, 'epoch': 4.0}
{'eval_loss': 0.4137658476829529, 'eval_accuracy': 0.9172932330827067, 'eval_f1': 0.9172932330827067, 'eval_runtime': 30.3464, 'eval_samples_per_second': 468.952, 'eval_steps_per_second': 7.348, 'epoch': 5.0}

Environment

The training was run on a DL1 instance on AWS using Habana Gaudi1 and optimum.

see for more information: https://github.com/philschmid/deep-learning-habana-huggingface

Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train philschmid/habana-xlm-r-large-amazon-massive