tsavage68's picture
End of training
c9962e1 verified
|
raw
history blame
3.82 kB
metadata
license: apache-2.0
base_model: mosaicml/mpt-7b-instruct
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mpt_1000_STEPS_1e5_rate_03_beta_DPO
    results: []

mpt_1000_STEPS_1e5_rate_03_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6933
  • Rewards/chosen: -0.0008
  • Rewards/rejected: -0.0019
  • Rewards/accuracies: 0.5187
  • Rewards/margins: 0.0011
  • Logps/rejected: -21.5638
  • Logps/chosen: -20.7947
  • Logits/rejected: 14.2524
  • Logits/chosen: 14.2550

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6965 0.1 100 0.6951 -0.0017 0.0013 0.4681 -0.0029 -21.5532 -20.7977 14.2557 14.2583
0.6918 0.2 200 0.6942 -0.0054 -0.0044 0.5011 -0.0010 -21.5722 -20.8104 14.2575 14.2601
0.6965 0.29 300 0.6941 -0.0016 -0.0010 0.4945 -0.0006 -21.5608 -20.7975 14.2549 14.2575
0.6906 0.39 400 0.6946 0.0001 0.0020 0.4747 -0.0019 -21.5507 -20.7919 14.2494 14.2520
0.6883 0.49 500 0.6972 -0.0019 0.0050 0.4484 -0.0069 -21.5408 -20.7986 14.2521 14.2547
0.6867 0.59 600 0.6969 -0.0054 0.0010 0.4418 -0.0064 -21.5541 -20.8103 14.2502 14.2528
0.6937 0.68 700 0.6939 0.0015 0.0020 0.5275 -0.0005 -21.5508 -20.7871 14.2547 14.2573
0.6855 0.78 800 0.6933 -0.0008 -0.0017 0.5099 0.0009 -21.5631 -20.7947 14.2522 14.2548
0.6918 0.88 900 0.6933 -0.0008 -0.0019 0.5187 0.0011 -21.5638 -20.7947 14.2524 14.2550
0.6957 0.98 1000 0.6933 -0.0008 -0.0019 0.5187 0.0011 -21.5638 -20.7947 14.2524 14.2550

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2