llama-3.2-3B-lora-r256
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the gommt-oneshot-train dataset. It achieves the following results on the evaluation set:
- Loss: 0.0062
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.0567 | 0.0372 | 5 | 0.0548 |
0.0421 | 0.0745 | 10 | 0.0434 |
0.0347 | 0.1117 | 15 | 0.0384 |
0.0306 | 0.1490 | 20 | 0.0344 |
0.0325 | 0.1862 | 25 | 0.0302 |
0.022 | 0.2235 | 30 | 0.0266 |
0.0251 | 0.2607 | 35 | 0.0241 |
0.0223 | 0.2980 | 40 | 0.0221 |
0.0174 | 0.3352 | 45 | 0.0208 |
0.0218 | 0.3724 | 50 | 0.0193 |
0.0208 | 0.4097 | 55 | 0.0189 |
0.0193 | 0.4469 | 60 | 0.0175 |
0.0178 | 0.4842 | 65 | 0.0167 |
0.017 | 0.5214 | 70 | 0.0159 |
0.0199 | 0.5587 | 75 | 0.0150 |
0.0185 | 0.5959 | 80 | 0.0150 |
0.0167 | 0.6331 | 85 | 0.0148 |
0.0159 | 0.6704 | 90 | 0.0143 |
0.0153 | 0.7076 | 95 | 0.0138 |
0.0144 | 0.7449 | 100 | 0.0136 |
0.0141 | 0.7821 | 105 | 0.0131 |
0.0156 | 0.8194 | 110 | 0.0129 |
0.0116 | 0.8566 | 115 | 0.0126 |
0.0154 | 0.8939 | 120 | 0.0123 |
0.0116 | 0.9311 | 125 | 0.0121 |
0.0167 | 0.9683 | 130 | 0.0118 |
0.0202 | 1.0056 | 135 | 0.0115 |
0.0126 | 1.0428 | 140 | 0.0114 |
0.0122 | 1.0801 | 145 | 0.0114 |
0.0126 | 1.1173 | 150 | 0.0114 |
0.0097 | 1.1546 | 155 | 0.0117 |
0.01 | 1.1918 | 160 | 0.0117 |
0.0112 | 1.2291 | 165 | 0.0111 |
0.0102 | 1.2663 | 170 | 0.0102 |
0.0114 | 1.3035 | 175 | 0.0096 |
0.0109 | 1.3408 | 180 | 0.0094 |
0.0119 | 1.3780 | 185 | 0.0096 |
0.0099 | 1.4153 | 190 | 0.0095 |
0.01 | 1.4525 | 195 | 0.0094 |
0.0117 | 1.4898 | 200 | 0.0093 |
0.0121 | 1.5270 | 205 | 0.0090 |
0.0104 | 1.5642 | 210 | 0.0088 |
0.0123 | 1.6015 | 215 | 0.0086 |
0.0092 | 1.6387 | 220 | 0.0084 |
0.012 | 1.6760 | 225 | 0.0086 |
0.0088 | 1.7132 | 230 | 0.0086 |
0.0098 | 1.7505 | 235 | 0.0080 |
0.01 | 1.7877 | 240 | 0.0083 |
0.0089 | 1.8250 | 245 | 0.0080 |
0.0094 | 1.8622 | 250 | 0.0082 |
0.0086 | 1.8994 | 255 | 0.0081 |
0.0092 | 1.9367 | 260 | 0.0080 |
0.0097 | 1.9739 | 265 | 0.0081 |
0.0074 | 2.0112 | 270 | 0.0079 |
0.0071 | 2.0484 | 275 | 0.0080 |
0.0087 | 2.0857 | 280 | 0.0079 |
0.0078 | 2.1229 | 285 | 0.0078 |
0.0071 | 2.1601 | 290 | 0.0078 |
0.0062 | 2.1974 | 295 | 0.0077 |
0.0072 | 2.2346 | 300 | 0.0078 |
0.0078 | 2.2719 | 305 | 0.0079 |
0.0071 | 2.3091 | 310 | 0.0079 |
0.0064 | 2.3464 | 315 | 0.0078 |
0.0075 | 2.3836 | 320 | 0.0077 |
0.0075 | 2.4209 | 325 | 0.0074 |
0.007 | 2.4581 | 330 | 0.0075 |
0.0067 | 2.4953 | 335 | 0.0074 |
0.0054 | 2.5326 | 340 | 0.0076 |
0.006 | 2.5698 | 345 | 0.0069 |
0.007 | 2.6071 | 350 | 0.0069 |
0.0058 | 2.6443 | 355 | 0.0069 |
0.0062 | 2.6816 | 360 | 0.0070 |
0.0075 | 2.7188 | 365 | 0.0070 |
0.0062 | 2.7561 | 370 | 0.0067 |
0.0064 | 2.7933 | 375 | 0.0067 |
0.0076 | 2.8305 | 380 | 0.0067 |
0.0062 | 2.8678 | 385 | 0.0067 |
0.0076 | 2.9050 | 390 | 0.0065 |
0.0064 | 2.9423 | 395 | 0.0064 |
0.006 | 2.9795 | 400 | 0.0065 |
0.0045 | 3.0168 | 405 | 0.0066 |
0.0043 | 3.0540 | 410 | 0.0067 |
0.0045 | 3.0912 | 415 | 0.0066 |
0.0038 | 3.1285 | 420 | 0.0067 |
0.0041 | 3.1657 | 425 | 0.0068 |
0.0042 | 3.2030 | 430 | 0.0067 |
0.0046 | 3.2402 | 435 | 0.0066 |
0.0047 | 3.2775 | 440 | 0.0066 |
0.0045 | 3.3147 | 445 | 0.0065 |
0.005 | 3.3520 | 450 | 0.0065 |
0.0049 | 3.3892 | 455 | 0.0067 |
0.0044 | 3.4264 | 460 | 0.0065 |
0.0054 | 3.4637 | 465 | 0.0064 |
0.0045 | 3.5009 | 470 | 0.0064 |
0.0037 | 3.5382 | 475 | 0.0064 |
0.0039 | 3.5754 | 480 | 0.0063 |
0.0044 | 3.6127 | 485 | 0.0063 |
0.0039 | 3.6499 | 490 | 0.0063 |
0.0045 | 3.6872 | 495 | 0.0064 |
0.0042 | 3.7244 | 500 | 0.0064 |
0.0044 | 3.7616 | 505 | 0.0063 |
0.0045 | 3.7989 | 510 | 0.0063 |
0.0041 | 3.8361 | 515 | 0.0063 |
0.0042 | 3.8734 | 520 | 0.0063 |
0.004 | 3.9106 | 525 | 0.0064 |
0.0042 | 3.9479 | 530 | 0.0064 |
0.0043 | 3.9851 | 535 | 0.0062 |
0.003 | 4.0223 | 540 | 0.0062 |
0.003 | 4.0596 | 545 | 0.0064 |
0.0038 | 4.0968 | 550 | 0.0064 |
0.0032 | 4.1341 | 555 | 0.0063 |
0.003 | 4.1713 | 560 | 0.0062 |
0.0025 | 4.2086 | 565 | 0.0063 |
0.0025 | 4.2458 | 570 | 0.0062 |
0.0029 | 4.2831 | 575 | 0.0063 |
0.0027 | 4.3203 | 580 | 0.0062 |
0.0029 | 4.3575 | 585 | 0.0063 |
0.0029 | 4.3948 | 590 | 0.0063 |
0.0029 | 4.4320 | 595 | 0.0063 |
0.0028 | 4.4693 | 600 | 0.0062 |
0.0035 | 4.5065 | 605 | 0.0062 |
0.0024 | 4.5438 | 610 | 0.0062 |
0.0026 | 4.5810 | 615 | 0.0062 |
0.0028 | 4.6182 | 620 | 0.0062 |
0.0024 | 4.6555 | 625 | 0.0062 |
0.0031 | 4.6927 | 630 | 0.0061 |
0.0028 | 4.7300 | 635 | 0.0062 |
0.0025 | 4.7672 | 640 | 0.0062 |
0.003 | 4.8045 | 645 | 0.0062 |
0.0027 | 4.8417 | 650 | 0.0062 |
0.0027 | 4.8790 | 655 | 0.0062 |
0.0028 | 4.9162 | 660 | 0.0062 |
0.0029 | 4.9534 | 665 | 0.0062 |
0.0029 | 4.9907 | 670 | 0.0062 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 3
Model tree for sizhkhy/gommt
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
unsloth/Llama-3.2-3B-Instruct