collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0953
- Num Input Tokens Seen: 46061512
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5968 | 0.0058 | 5 | 1.3874 | 274016 |
1.5888 | 0.0117 | 10 | 1.3599 | 549248 |
1.5382 | 0.0175 | 15 | 1.2963 | 816856 |
1.3581 | 0.0233 | 20 | 1.2469 | 1090440 |
1.3432 | 0.0292 | 25 | 1.1991 | 1356328 |
1.2178 | 0.0350 | 30 | 1.1829 | 1625504 |
1.0423 | 0.0408 | 35 | 1.1963 | 1892072 |
0.955 | 0.0466 | 40 | 1.2117 | 2160704 |
0.7296 | 0.0525 | 45 | 1.2296 | 2423616 |
0.693 | 0.0583 | 50 | 1.2351 | 2691480 |
0.6242 | 0.0641 | 55 | 1.2659 | 2959936 |
0.477 | 0.0700 | 60 | 1.2266 | 3232312 |
0.4849 | 0.0758 | 65 | 1.2371 | 3506080 |
0.3674 | 0.0816 | 70 | 1.2116 | 3771008 |
0.4181 | 0.0875 | 75 | 1.2076 | 4046488 |
0.3025 | 0.0933 | 80 | 1.1966 | 4313744 |
0.4752 | 0.0991 | 85 | 1.1925 | 4584688 |
0.2821 | 0.1049 | 90 | 1.1857 | 4857624 |
0.3378 | 0.1108 | 95 | 1.1925 | 5123664 |
0.3097 | 0.1166 | 100 | 1.1774 | 5393576 |
0.2827 | 0.1224 | 105 | 1.1792 | 5659728 |
0.2987 | 0.1283 | 110 | 1.1717 | 5924392 |
0.3106 | 0.1341 | 115 | 1.1737 | 6191112 |
0.2508 | 0.1399 | 120 | 1.1701 | 6455928 |
0.3027 | 0.1458 | 125 | 1.1641 | 6729648 |
0.2612 | 0.1516 | 130 | 1.1665 | 6999848 |
0.3809 | 0.1574 | 135 | 1.1604 | 7271064 |
0.2996 | 0.1633 | 140 | 1.1666 | 7533968 |
0.2109 | 0.1691 | 145 | 1.1592 | 7805864 |
0.2971 | 0.1749 | 150 | 1.1587 | 8068512 |
0.2293 | 0.1807 | 155 | 1.1612 | 8332096 |
0.2631 | 0.1866 | 160 | 1.1526 | 8601832 |
0.3133 | 0.1924 | 165 | 1.1551 | 8874072 |
0.2839 | 0.1982 | 170 | 1.1495 | 9144448 |
0.1905 | 0.2041 | 175 | 1.1487 | 9417304 |
0.2209 | 0.2099 | 180 | 1.1497 | 9681824 |
0.2914 | 0.2157 | 185 | 1.1467 | 9950760 |
0.1705 | 0.2216 | 190 | 1.1465 | 10211560 |
0.2146 | 0.2274 | 195 | 1.1462 | 10488464 |
0.242 | 0.2332 | 200 | 1.1464 | 10754104 |
0.2219 | 0.2390 | 205 | 1.1460 | 11021408 |
0.2642 | 0.2449 | 210 | 1.1404 | 11294984 |
0.1826 | 0.2507 | 215 | 1.1424 | 11561648 |
0.2356 | 0.2565 | 220 | 1.1435 | 11826208 |
0.328 | 0.2624 | 225 | 1.1395 | 12100544 |
0.1349 | 0.2682 | 230 | 1.1407 | 12362208 |
0.2807 | 0.2740 | 235 | 1.1399 | 12622408 |
0.1764 | 0.2799 | 240 | 1.1347 | 12896808 |
0.2064 | 0.2857 | 245 | 1.1373 | 13167648 |
0.1623 | 0.2915 | 250 | 1.1351 | 13433128 |
0.2485 | 0.2974 | 255 | 1.1370 | 13690936 |
0.2428 | 0.3032 | 260 | 1.1320 | 13959408 |
0.2053 | 0.3090 | 265 | 1.1335 | 14230960 |
0.2313 | 0.3148 | 270 | 1.1343 | 14499120 |
0.2406 | 0.3207 | 275 | 1.1321 | 14764680 |
0.2183 | 0.3265 | 280 | 1.1353 | 15033104 |
0.1797 | 0.3323 | 285 | 1.1312 | 15297040 |
0.3162 | 0.3382 | 290 | 1.1292 | 15559944 |
0.2857 | 0.3440 | 295 | 1.1332 | 15826864 |
0.2406 | 0.3498 | 300 | 1.1292 | 16096288 |
0.2673 | 0.3557 | 305 | 1.1258 | 16374784 |
0.1881 | 0.3615 | 310 | 1.1297 | 16637024 |
0.2682 | 0.3673 | 315 | 1.1273 | 16903080 |
0.2029 | 0.3732 | 320 | 1.1251 | 17178504 |
0.2491 | 0.3790 | 325 | 1.1224 | 17447112 |
0.2047 | 0.3848 | 330 | 1.1250 | 17718640 |
0.2366 | 0.3906 | 335 | 1.1236 | 17986504 |
0.2436 | 0.3965 | 340 | 1.1227 | 18251352 |
0.2128 | 0.4023 | 345 | 1.1248 | 18523552 |
0.2032 | 0.4081 | 350 | 1.1212 | 18796232 |
0.2553 | 0.4140 | 355 | 1.1200 | 19067224 |
0.2161 | 0.4198 | 360 | 1.1208 | 19340744 |
0.198 | 0.4256 | 365 | 1.1221 | 19606976 |
0.2211 | 0.4315 | 370 | 1.1176 | 19879912 |
0.1821 | 0.4373 | 375 | 1.1215 | 20148984 |
0.2017 | 0.4431 | 380 | 1.1212 | 20421512 |
0.1747 | 0.4489 | 385 | 1.1176 | 20687432 |
0.1726 | 0.4548 | 390 | 1.1202 | 20956608 |
0.2402 | 0.4606 | 395 | 1.1188 | 21225824 |
0.2234 | 0.4664 | 400 | 1.1174 | 21493456 |
0.2382 | 0.4723 | 405 | 1.1170 | 21763616 |
0.247 | 0.4781 | 410 | 1.1172 | 22038728 |
0.2031 | 0.4839 | 415 | 1.1176 | 22309024 |
0.2817 | 0.4898 | 420 | 1.1148 | 22565768 |
0.3093 | 0.4956 | 425 | 1.1152 | 22827816 |
0.1926 | 0.5014 | 430 | 1.1143 | 23101176 |
0.2022 | 0.5073 | 435 | 1.1126 | 23379960 |
0.1572 | 0.5131 | 440 | 1.1154 | 23640264 |
0.24 | 0.5189 | 445 | 1.1151 | 23911848 |
0.1476 | 0.5247 | 450 | 1.1119 | 24178904 |
0.1606 | 0.5306 | 455 | 1.1150 | 24444888 |
0.2244 | 0.5364 | 460 | 1.1128 | 24717512 |
0.265 | 0.5422 | 465 | 1.1105 | 24985952 |
0.2012 | 0.5481 | 470 | 1.1138 | 25252720 |
0.1523 | 0.5539 | 475 | 1.1124 | 25519304 |
0.2278 | 0.5597 | 480 | 1.1097 | 25787512 |
0.2459 | 0.5656 | 485 | 1.1108 | 26053400 |
0.273 | 0.5714 | 490 | 1.1123 | 26328544 |
0.1601 | 0.5772 | 495 | 1.1107 | 26596672 |
0.1721 | 0.5830 | 500 | 1.1081 | 26859880 |
0.2109 | 0.5889 | 505 | 1.1115 | 27124824 |
0.1354 | 0.5947 | 510 | 1.1108 | 27399936 |
0.1512 | 0.6005 | 515 | 1.1090 | 27670992 |
0.199 | 0.6064 | 520 | 1.1090 | 27943632 |
0.1764 | 0.6122 | 525 | 1.1086 | 28218056 |
0.2514 | 0.6180 | 530 | 1.1060 | 28491008 |
0.1199 | 0.6239 | 535 | 1.1065 | 28756392 |
0.182 | 0.6297 | 540 | 1.1094 | 29024760 |
0.2276 | 0.6355 | 545 | 1.1065 | 29300168 |
0.1998 | 0.6414 | 550 | 1.1045 | 29575136 |
0.1562 | 0.6472 | 555 | 1.1063 | 29851208 |
0.2335 | 0.6530 | 560 | 1.1064 | 30121928 |
0.2346 | 0.6588 | 565 | 1.1046 | 30395808 |
0.1692 | 0.6647 | 570 | 1.1051 | 30658568 |
0.1578 | 0.6705 | 575 | 1.1044 | 30925224 |
0.1954 | 0.6763 | 580 | 1.1028 | 31193344 |
0.2554 | 0.6822 | 585 | 1.1025 | 31461440 |
0.2701 | 0.6880 | 590 | 1.1043 | 31725320 |
0.1931 | 0.6938 | 595 | 1.1030 | 31994944 |
0.2218 | 0.6997 | 600 | 1.1007 | 32257112 |
0.1963 | 0.7055 | 605 | 1.1074 | 32524952 |
0.1699 | 0.7113 | 610 | 1.1054 | 32796760 |
0.2052 | 0.7171 | 615 | 1.1024 | 33071208 |
0.1921 | 0.7230 | 620 | 1.1028 | 33345336 |
0.2832 | 0.7288 | 625 | 1.1024 | 33605392 |
0.212 | 0.7346 | 630 | 1.1022 | 33871240 |
0.221 | 0.7405 | 635 | 1.1019 | 34137144 |
0.1636 | 0.7463 | 640 | 1.1017 | 34405064 |
0.2403 | 0.7521 | 645 | 1.1008 | 34674784 |
0.2656 | 0.7580 | 650 | 1.1028 | 34941448 |
0.1955 | 0.7638 | 655 | 1.1024 | 35206152 |
0.1686 | 0.7696 | 660 | 1.1002 | 35474864 |
0.0948 | 0.7755 | 665 | 1.1012 | 35740808 |
0.2263 | 0.7813 | 670 | 1.1020 | 36005752 |
0.1881 | 0.7871 | 675 | 1.1016 | 36270624 |
0.1629 | 0.7929 | 680 | 1.1000 | 36543976 |
0.1878 | 0.7988 | 685 | 1.1021 | 36815856 |
0.1922 | 0.8046 | 690 | 1.1028 | 37087744 |
0.2065 | 0.8104 | 695 | 1.0994 | 37355984 |
0.2857 | 0.8163 | 700 | 1.0983 | 37616472 |
0.1905 | 0.8221 | 705 | 1.1007 | 37888352 |
0.1892 | 0.8279 | 710 | 1.1008 | 38150608 |
0.0965 | 0.8338 | 715 | 1.1007 | 38421072 |
0.2687 | 0.8396 | 720 | 1.0986 | 38689680 |
0.2128 | 0.8454 | 725 | 1.0977 | 38955000 |
0.1166 | 0.8512 | 730 | 1.1000 | 39225856 |
0.198 | 0.8571 | 735 | 1.1002 | 39493256 |
0.1589 | 0.8629 | 740 | 1.0990 | 39760584 |
0.207 | 0.8687 | 745 | 1.0985 | 40024456 |
0.2328 | 0.8746 | 750 | 1.0997 | 40298392 |
0.1799 | 0.8804 | 755 | 1.0979 | 40558912 |
0.1831 | 0.8862 | 760 | 1.0959 | 40829280 |
0.1427 | 0.8921 | 765 | 1.0975 | 41100704 |
0.1834 | 0.8979 | 770 | 1.0980 | 41368552 |
0.1369 | 0.9037 | 775 | 1.0973 | 41639256 |
0.1502 | 0.9096 | 780 | 1.0962 | 41912120 |
0.1753 | 0.9154 | 785 | 1.0970 | 42188992 |
0.1985 | 0.9212 | 790 | 1.0969 | 42457560 |
0.155 | 0.9270 | 795 | 1.0950 | 42731488 |
0.1584 | 0.9329 | 800 | 1.0972 | 43002704 |
0.2282 | 0.9387 | 805 | 1.0982 | 43274512 |
0.2228 | 0.9445 | 810 | 1.0948 | 43539648 |
0.2049 | 0.9504 | 815 | 1.0934 | 43812632 |
0.2067 | 0.9562 | 820 | 1.0957 | 44081896 |
0.174 | 0.9620 | 825 | 1.0984 | 44346184 |
0.1571 | 0.9679 | 830 | 1.0970 | 44612784 |
0.1719 | 0.9737 | 835 | 1.0954 | 44880368 |
0.1625 | 0.9795 | 840 | 1.0926 | 45155416 |
0.212 | 0.9854 | 845 | 1.0940 | 45424688 |
0.1882 | 0.9912 | 850 | 1.0967 | 45687232 |
0.2031 | 0.9970 | 855 | 1.0960 | 45956808 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd1
Base model
google/gemma-2-2b