|
--- |
|
license: gemma |
|
base_model: google/gemma-2-2b |
|
tags: |
|
- trl |
|
- sft |
|
- generated_from_trainer |
|
model-index: |
|
- name: collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0 |
|
|
|
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co./google/gemma-2-2b) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.1114 |
|
- Num Input Tokens Seen: 30129080 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 8e-06 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 16 |
|
- seed: 0 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: constant_with_warmup |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |
|
|:-------------:|:------:|:----:|:---------------:|:-----------------:| |
|
| No log | 0 | 0 | 1.3956 | 0 | |
|
| 1.6587 | 0.0092 | 5 | 1.3863 | 276696 | |
|
| 1.5356 | 0.0185 | 10 | 1.3193 | 550488 | |
|
| 1.4519 | 0.0277 | 15 | 1.2505 | 831264 | |
|
| 1.3766 | 0.0369 | 20 | 1.1950 | 1107760 | |
|
| 1.3159 | 0.0461 | 25 | 1.1635 | 1389896 | |
|
| 1.1803 | 0.0554 | 30 | 1.1441 | 1668904 | |
|
| 1.1339 | 0.0646 | 35 | 1.1371 | 1949864 | |
|
| 0.9903 | 0.0738 | 40 | 1.1480 | 2233824 | |
|
| 1.0179 | 0.0830 | 45 | 1.1589 | 2509936 | |
|
| 0.9916 | 0.0923 | 50 | 1.1672 | 2793040 | |
|
| 0.9015 | 0.1015 | 55 | 1.1839 | 3068680 | |
|
| 0.8562 | 0.1107 | 60 | 1.1852 | 3347056 | |
|
| 0.8485 | 0.1200 | 65 | 1.1998 | 3627512 | |
|
| 0.7508 | 0.1292 | 70 | 1.2026 | 3905976 | |
|
| 0.7357 | 0.1384 | 75 | 1.2045 | 4179328 | |
|
| 0.6496 | 0.1476 | 80 | 1.1934 | 4453160 | |
|
| 0.7891 | 0.1569 | 85 | 1.1950 | 4735096 | |
|
| 0.5708 | 0.1661 | 90 | 1.1959 | 5015384 | |
|
| 0.607 | 0.1753 | 95 | 1.2026 | 5284160 | |
|
| 0.5427 | 0.1845 | 100 | 1.1955 | 5555648 | |
|
| 0.4434 | 0.1938 | 105 | 1.1935 | 5839504 | |
|
| 0.4716 | 0.2030 | 110 | 1.1997 | 6113904 | |
|
| 0.5612 | 0.2122 | 115 | 1.1869 | 6394080 | |
|
| 0.5522 | 0.2215 | 120 | 1.1934 | 6668968 | |
|
| 0.4752 | 0.2307 | 125 | 1.1917 | 6943216 | |
|
| 0.3948 | 0.2399 | 130 | 1.1873 | 7224944 | |
|
| 0.4525 | 0.2491 | 135 | 1.1890 | 7499080 | |
|
| 0.5147 | 0.2584 | 140 | 1.1814 | 7773104 | |
|
| 0.4881 | 0.2676 | 145 | 1.1917 | 8050400 | |
|
| 0.3915 | 0.2768 | 150 | 1.1842 | 8332168 | |
|
| 0.4032 | 0.2860 | 155 | 1.1897 | 8608296 | |
|
| 0.4227 | 0.2953 | 160 | 1.1804 | 8887936 | |
|
| 0.4128 | 0.3045 | 165 | 1.1838 | 9164856 | |
|
| 0.4097 | 0.3137 | 170 | 1.1759 | 9448376 | |
|
| 0.3663 | 0.3230 | 175 | 1.1841 | 9721256 | |
|
| 0.4311 | 0.3322 | 180 | 1.1780 | 9999808 | |
|
| 0.3765 | 0.3414 | 185 | 1.1763 | 10273504 | |
|
| 0.4953 | 0.3506 | 190 | 1.1663 | 10554248 | |
|
| 0.3491 | 0.3599 | 195 | 1.1760 | 10835664 | |
|
| 0.5705 | 0.3691 | 200 | 1.1670 | 11117936 | |
|
| 0.3433 | 0.3783 | 205 | 1.1677 | 11394272 | |
|
| 0.366 | 0.3875 | 210 | 1.1675 | 11667112 | |
|
| 0.3678 | 0.3968 | 215 | 1.1643 | 11940344 | |
|
| 0.3999 | 0.4060 | 220 | 1.1664 | 12226416 | |
|
| 0.2779 | 0.4152 | 225 | 1.1623 | 12509896 | |
|
| 0.2937 | 0.4245 | 230 | 1.1625 | 12789696 | |
|
| 0.3232 | 0.4337 | 235 | 1.1577 | 13067376 | |
|
| 0.2727 | 0.4429 | 240 | 1.1603 | 13347168 | |
|
| 0.4066 | 0.4521 | 245 | 1.1549 | 13623832 | |
|
| 0.3169 | 0.4614 | 250 | 1.1554 | 13902696 | |
|
| 0.3345 | 0.4706 | 255 | 1.1557 | 14188000 | |
|
| 0.3015 | 0.4798 | 260 | 1.1543 | 14470712 | |
|
| 0.3465 | 0.4890 | 265 | 1.1519 | 14746408 | |
|
| 0.3225 | 0.4983 | 270 | 1.1479 | 15018640 | |
|
| 0.2737 | 0.5075 | 275 | 1.1483 | 15296928 | |
|
| 0.3426 | 0.5167 | 280 | 1.1429 | 15574408 | |
|
| 0.3332 | 0.5260 | 285 | 1.1446 | 15847000 | |
|
| 0.2775 | 0.5352 | 290 | 1.1413 | 16126256 | |
|
| 0.3818 | 0.5444 | 295 | 1.1398 | 16403872 | |
|
| 0.402 | 0.5536 | 300 | 1.1409 | 16683200 | |
|
| 0.3527 | 0.5629 | 305 | 1.1387 | 16957856 | |
|
| 0.3747 | 0.5721 | 310 | 1.1381 | 17237088 | |
|
| 0.2767 | 0.5813 | 315 | 1.1398 | 17514672 | |
|
| 0.397 | 0.5905 | 320 | 1.1353 | 17790912 | |
|
| 0.2713 | 0.5998 | 325 | 1.1355 | 18067224 | |
|
| 0.3836 | 0.6090 | 330 | 1.1335 | 18345448 | |
|
| 0.2953 | 0.6182 | 335 | 1.1340 | 18625288 | |
|
| 0.3032 | 0.6275 | 340 | 1.1339 | 18895360 | |
|
| 0.3337 | 0.6367 | 345 | 1.1315 | 19176592 | |
|
| 0.2324 | 0.6459 | 350 | 1.1368 | 19456384 | |
|
| 0.3954 | 0.6551 | 355 | 1.1290 | 19736048 | |
|
| 0.3867 | 0.6644 | 360 | 1.1316 | 20017992 | |
|
| 0.2376 | 0.6736 | 365 | 1.1317 | 20299128 | |
|
| 0.2497 | 0.6828 | 370 | 1.1302 | 20572064 | |
|
| 0.2433 | 0.6920 | 375 | 1.1295 | 20847344 | |
|
| 0.3257 | 0.7013 | 380 | 1.1262 | 21131912 | |
|
| 0.3596 | 0.7105 | 385 | 1.1299 | 21410128 | |
|
| 0.3307 | 0.7197 | 390 | 1.1261 | 21691144 | |
|
| 0.3911 | 0.7290 | 395 | 1.1277 | 21972080 | |
|
| 0.3247 | 0.7382 | 400 | 1.1245 | 22254672 | |
|
| 0.3654 | 0.7474 | 405 | 1.1262 | 22539544 | |
|
| 0.2657 | 0.7566 | 410 | 1.1235 | 22820048 | |
|
| 0.3721 | 0.7659 | 415 | 1.1242 | 23096928 | |
|
| 0.2776 | 0.7751 | 420 | 1.1227 | 23369624 | |
|
| 0.2669 | 0.7843 | 425 | 1.1249 | 23652232 | |
|
| 0.3584 | 0.7935 | 430 | 1.1227 | 23931024 | |
|
| 0.4058 | 0.8028 | 435 | 1.1194 | 24211728 | |
|
| 0.271 | 0.8120 | 440 | 1.1246 | 24490376 | |
|
| 0.2958 | 0.8212 | 445 | 1.1206 | 24772424 | |
|
| 0.2507 | 0.8304 | 450 | 1.1214 | 25054744 | |
|
| 0.3209 | 0.8397 | 455 | 1.1193 | 25331320 | |
|
| 0.2983 | 0.8489 | 460 | 1.1173 | 25606720 | |
|
| 0.302 | 0.8581 | 465 | 1.1181 | 25890600 | |
|
| 0.4136 | 0.8674 | 470 | 1.1165 | 26167160 | |
|
| 0.3069 | 0.8766 | 475 | 1.1179 | 26448160 | |
|
| 0.2351 | 0.8858 | 480 | 1.1173 | 26723544 | |
|
| 0.2373 | 0.8950 | 485 | 1.1175 | 27006408 | |
|
| 0.3894 | 0.9043 | 490 | 1.1146 | 27281088 | |
|
| 0.277 | 0.9135 | 495 | 1.1174 | 27562296 | |
|
| 0.3009 | 0.9227 | 500 | 1.1151 | 27833952 | |
|
| 0.3229 | 0.9319 | 505 | 1.1139 | 28106704 | |
|
| 0.2891 | 0.9412 | 510 | 1.1161 | 28385768 | |
|
| 0.2745 | 0.9504 | 515 | 1.1128 | 28670136 | |
|
| 0.3377 | 0.9596 | 520 | 1.1158 | 28953688 | |
|
| 0.3045 | 0.9689 | 525 | 1.1126 | 29230304 | |
|
| 0.2475 | 0.9781 | 530 | 1.1150 | 29509224 | |
|
| 0.2633 | 0.9873 | 535 | 1.1121 | 29791512 | |
|
| 0.2622 | 0.9965 | 540 | 1.1105 | 30074936 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.44.0 |
|
- Pytorch 2.4.0+cu121 |
|
- Datasets 2.20.0 |
|
- Tokenizers 0.19.1 |
|
|