Meta-Llama-3-8B-QLoRA-Assessment-Rationale-dpo
The model trained with w/o private data from the EMNLP 2024 Paper: Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring.
- Paper: Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring (EMNLP 2024 Findings)
- GitHub Repository: Thought Tree Assessment Repository
Intended uses & limitations
This model offers a valuable resource for research in explainable AI within educational technology. The model is trained with noisy response-level rationales. This makes them unsuitable for direct application in high-stakes assessments without additional verification.
Training and evaluation data
We trained and evaluated the model on the Synthetic Rationale data, which was generated from the Rationale MCTS data.
To extract scores from rationales, please use the jiazhengli/deberta-v3-large-Rationale-to-Score.
Citation
Please cite the following work if you utilize this model:
BibTeX:
@misc{li2024calibratingllmspreferenceoptimization,
title={Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring},
author={Jiazheng Li and Hainiu Xu and Zhaoyue Sun and Yuxiang Zhou and David West and Cesare Aloisi and Yulan He},
year={2024},
eprint={2406.19949},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.19949},
}
Training procedure
Please refer to our paper.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 3.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
1.3287 | 0.33 | 200 | 1.8699 | 9.9343 | 9.1282 | 0.6312 | 0.8061 | -167.4677 | -142.3102 | -1.1863 | -1.1862 |
1.1821 | 0.67 | 400 | 1.9729 | 9.9379 | 9.2024 | 0.6113 | 0.7354 | -166.7256 | -142.2745 | -1.2732 | -1.2718 |
0.9116 | 1.0 | 600 | 1.9455 | 9.7997 | 8.9466 | 0.6482 | 0.8531 | -169.2835 | -143.6562 | -1.3527 | -1.3510 |
0.8412 | 1.33 | 800 | 2.0041 | 9.5449 | 8.5167 | 0.6397 | 1.0282 | -173.5831 | -146.2043 | -1.4206 | -1.4179 |
0.7345 | 1.67 | 1000 | 2.0659 | 9.1494 | 8.1514 | 0.6426 | 0.9980 | -177.2357 | -150.1593 | -1.4325 | -1.4290 |
0.6609 | 2.0 | 1200 | 2.0321 | 9.0327 | 7.8126 | 0.6681 | 1.2200 | -180.6237 | -151.3265 | -1.4359 | -1.4321 |
0.6768 | 2.33 | 1400 | 2.0313 | 9.1007 | 7.8929 | 0.6709 | 1.2079 | -179.8211 | -150.6457 | -1.4472 | -1.4432 |
0.615 | 2.67 | 1600 | 2.0515 | 9.0972 | 7.9582 | 0.6624 | 1.1390 | -179.1680 | -150.6812 | -1.4413 | -1.4370 |
Framework versions
- PEFT 0.10.0
- Transformers 4.38.2
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 1
Model tree for jiazhengli/Meta-Llama-3-8B-QLoRA-Assessment-Rationale-dpo
Base model
meta-llama/Meta-Llama-3-8B