collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0973
- Num Input Tokens Seen: 52420976
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6464 | 0.0053 | 5 | 1.3881 | 275744 |
1.616 | 0.0105 | 10 | 1.3632 | 556272 |
1.521 | 0.0158 | 15 | 1.3080 | 835128 |
1.4553 | 0.0210 | 20 | 1.2604 | 1112712 |
1.3818 | 0.0263 | 25 | 1.2241 | 1386904 |
1.1816 | 0.0316 | 30 | 1.1940 | 1663448 |
1.0951 | 0.0368 | 35 | 1.2108 | 1935000 |
1.0163 | 0.0421 | 40 | 1.2173 | 2207544 |
0.8286 | 0.0473 | 45 | 1.2475 | 2479888 |
0.7088 | 0.0526 | 50 | 1.2926 | 2760008 |
0.5394 | 0.0579 | 55 | 1.2764 | 3040144 |
0.4906 | 0.0631 | 60 | 1.2574 | 3311456 |
0.4247 | 0.0684 | 65 | 1.2465 | 3594720 |
0.4154 | 0.0736 | 70 | 1.2376 | 3869712 |
0.3512 | 0.0789 | 75 | 1.2316 | 4145256 |
0.263 | 0.0842 | 80 | 1.2230 | 4411856 |
0.2766 | 0.0894 | 85 | 1.2119 | 4689752 |
0.3177 | 0.0947 | 90 | 1.2052 | 4969824 |
0.3498 | 0.0999 | 95 | 1.1948 | 5238616 |
0.3406 | 0.1052 | 100 | 1.1939 | 5512992 |
0.2487 | 0.1105 | 105 | 1.1879 | 5791080 |
0.2642 | 0.1157 | 110 | 1.1828 | 6063168 |
0.2988 | 0.1210 | 115 | 1.1856 | 6339160 |
0.3161 | 0.1262 | 120 | 1.1766 | 6620184 |
0.2735 | 0.1315 | 125 | 1.1806 | 6894400 |
0.3167 | 0.1368 | 130 | 1.1690 | 7171888 |
0.2169 | 0.1420 | 135 | 1.1710 | 7447800 |
0.2587 | 0.1473 | 140 | 1.1682 | 7726144 |
0.2769 | 0.1526 | 145 | 1.1664 | 8001232 |
0.242 | 0.1578 | 150 | 1.1645 | 8279752 |
0.2176 | 0.1631 | 155 | 1.1625 | 8551552 |
0.2537 | 0.1683 | 160 | 1.1589 | 8828352 |
0.326 | 0.1736 | 165 | 1.1569 | 9108264 |
0.2361 | 0.1789 | 170 | 1.1541 | 9378776 |
0.1821 | 0.1841 | 175 | 1.1557 | 9654304 |
0.203 | 0.1894 | 180 | 1.1519 | 9928880 |
0.3098 | 0.1946 | 185 | 1.1510 | 10206984 |
0.2073 | 0.1999 | 190 | 1.1523 | 10475312 |
0.2015 | 0.2052 | 195 | 1.1501 | 10755920 |
0.1842 | 0.2104 | 200 | 1.1474 | 11027776 |
0.2372 | 0.2157 | 205 | 1.1485 | 11310016 |
0.2136 | 0.2209 | 210 | 1.1461 | 11593432 |
0.146 | 0.2262 | 215 | 1.1478 | 11866256 |
0.2169 | 0.2315 | 220 | 1.1452 | 12128712 |
0.2328 | 0.2367 | 225 | 1.1420 | 12405512 |
0.3221 | 0.2420 | 230 | 1.1430 | 12679104 |
0.2402 | 0.2472 | 235 | 1.1399 | 12959440 |
0.2483 | 0.2525 | 240 | 1.1427 | 13237632 |
0.2262 | 0.2578 | 245 | 1.1412 | 13514584 |
0.1712 | 0.2630 | 250 | 1.1389 | 13793808 |
0.2165 | 0.2683 | 255 | 1.1387 | 14075224 |
0.2268 | 0.2735 | 260 | 1.1393 | 14352376 |
0.2488 | 0.2788 | 265 | 1.1392 | 14632872 |
0.1661 | 0.2841 | 270 | 1.1393 | 14911864 |
0.2001 | 0.2893 | 275 | 1.1385 | 15182928 |
0.231 | 0.2946 | 280 | 1.1328 | 15454816 |
0.1368 | 0.2998 | 285 | 1.1386 | 15727680 |
0.2376 | 0.3051 | 290 | 1.1361 | 16002808 |
0.2255 | 0.3104 | 295 | 1.1314 | 16286656 |
0.2405 | 0.3156 | 300 | 1.1347 | 16559520 |
0.1658 | 0.3209 | 305 | 1.1322 | 16825824 |
0.2144 | 0.3261 | 310 | 1.1315 | 17103184 |
0.2324 | 0.3314 | 315 | 1.1334 | 17376440 |
0.2019 | 0.3367 | 320 | 1.1283 | 17651576 |
0.1271 | 0.3419 | 325 | 1.1296 | 17925112 |
0.2883 | 0.3472 | 330 | 1.1313 | 18200168 |
0.1831 | 0.3524 | 335 | 1.1278 | 18474400 |
0.2386 | 0.3577 | 340 | 1.1293 | 18754744 |
0.1069 | 0.3630 | 345 | 1.1272 | 19039216 |
0.2248 | 0.3682 | 350 | 1.1246 | 19317424 |
0.168 | 0.3735 | 355 | 1.1268 | 19594288 |
0.2574 | 0.3787 | 360 | 1.1264 | 19870104 |
0.1993 | 0.3840 | 365 | 1.1253 | 20149168 |
0.1424 | 0.3893 | 370 | 1.1277 | 20423232 |
0.2319 | 0.3945 | 375 | 1.1252 | 20689504 |
0.1374 | 0.3998 | 380 | 1.1254 | 20962504 |
0.1957 | 0.4050 | 385 | 1.1250 | 21238320 |
0.2567 | 0.4103 | 390 | 1.1224 | 21513752 |
0.2025 | 0.4156 | 395 | 1.1221 | 21792712 |
0.1351 | 0.4208 | 400 | 1.1215 | 22072768 |
0.1946 | 0.4261 | 405 | 1.1219 | 22346032 |
0.1623 | 0.4314 | 410 | 1.1212 | 22623464 |
0.1344 | 0.4366 | 415 | 1.1220 | 22898464 |
0.1786 | 0.4419 | 420 | 1.1221 | 23167888 |
0.2108 | 0.4471 | 425 | 1.1187 | 23443248 |
0.2651 | 0.4524 | 430 | 1.1195 | 23723440 |
0.1972 | 0.4577 | 435 | 1.1206 | 24003304 |
0.1881 | 0.4629 | 440 | 1.1174 | 24276072 |
0.2527 | 0.4682 | 445 | 1.1158 | 24548600 |
0.1596 | 0.4734 | 450 | 1.1180 | 24822680 |
0.1927 | 0.4787 | 455 | 1.1165 | 25103280 |
0.1879 | 0.4840 | 460 | 1.1169 | 25384848 |
0.2702 | 0.4892 | 465 | 1.1195 | 25664496 |
0.253 | 0.4945 | 470 | 1.1131 | 25944728 |
0.2189 | 0.4997 | 475 | 1.1145 | 26221376 |
0.2071 | 0.5050 | 480 | 1.1175 | 26497632 |
0.2222 | 0.5103 | 485 | 1.1153 | 26774640 |
0.1671 | 0.5155 | 490 | 1.1140 | 27055808 |
0.2184 | 0.5208 | 495 | 1.1138 | 27332968 |
0.1734 | 0.5260 | 500 | 1.1151 | 27608480 |
0.2276 | 0.5313 | 505 | 1.1160 | 27882664 |
0.2325 | 0.5366 | 510 | 1.1125 | 28162240 |
0.1572 | 0.5418 | 515 | 1.1120 | 28438984 |
0.219 | 0.5471 | 520 | 1.1125 | 28717080 |
0.1661 | 0.5523 | 525 | 1.1106 | 28993680 |
0.2204 | 0.5576 | 530 | 1.1139 | 29267288 |
0.221 | 0.5629 | 535 | 1.1106 | 29543888 |
0.0899 | 0.5681 | 540 | 1.1093 | 29815392 |
0.1472 | 0.5734 | 545 | 1.1121 | 30086232 |
0.2434 | 0.5786 | 550 | 1.1104 | 30359376 |
0.2186 | 0.5839 | 555 | 1.1087 | 30640120 |
0.1473 | 0.5892 | 560 | 1.1095 | 30923504 |
0.1932 | 0.5944 | 565 | 1.1099 | 31205552 |
0.1296 | 0.5997 | 570 | 1.1081 | 31477344 |
0.2337 | 0.6049 | 575 | 1.1076 | 31754144 |
0.1498 | 0.6102 | 580 | 1.1085 | 32030168 |
0.1419 | 0.6155 | 585 | 1.1074 | 32306544 |
0.1691 | 0.6207 | 590 | 1.1090 | 32580472 |
0.1481 | 0.6260 | 595 | 1.1075 | 32858792 |
0.153 | 0.6312 | 600 | 1.1071 | 33137456 |
0.1361 | 0.6365 | 605 | 1.1080 | 33408416 |
0.2361 | 0.6418 | 610 | 1.1062 | 33679848 |
0.2217 | 0.6470 | 615 | 1.1077 | 33940144 |
0.1492 | 0.6523 | 620 | 1.1067 | 34215192 |
0.1511 | 0.6575 | 625 | 1.1053 | 34498680 |
0.2054 | 0.6628 | 630 | 1.1056 | 34774928 |
0.1792 | 0.6681 | 635 | 1.1057 | 35049520 |
0.2711 | 0.6733 | 640 | 1.1049 | 35330888 |
0.1757 | 0.6786 | 645 | 1.1041 | 35607280 |
0.1714 | 0.6839 | 650 | 1.1072 | 35883344 |
0.1467 | 0.6891 | 655 | 1.1056 | 36164448 |
0.2115 | 0.6944 | 660 | 1.1036 | 36446976 |
0.239 | 0.6996 | 665 | 1.1050 | 36711920 |
0.1931 | 0.7049 | 670 | 1.1043 | 36986944 |
0.2626 | 0.7102 | 675 | 1.1043 | 37262920 |
0.2028 | 0.7154 | 680 | 1.1043 | 37544576 |
0.1767 | 0.7207 | 685 | 1.1050 | 37824128 |
0.1982 | 0.7259 | 690 | 1.1030 | 38095632 |
0.1737 | 0.7312 | 695 | 1.1014 | 38365112 |
0.2518 | 0.7365 | 700 | 1.1034 | 38649920 |
0.2115 | 0.7417 | 705 | 1.1029 | 38926632 |
0.192 | 0.7470 | 710 | 1.1012 | 39204312 |
0.1431 | 0.7522 | 715 | 1.1034 | 39488336 |
0.1386 | 0.7575 | 720 | 1.1029 | 39764600 |
0.213 | 0.7628 | 725 | 1.1009 | 40038648 |
0.1164 | 0.7680 | 730 | 1.1031 | 40315144 |
0.2358 | 0.7733 | 735 | 1.1053 | 40595272 |
0.2121 | 0.7785 | 740 | 1.1026 | 40862880 |
0.1342 | 0.7838 | 745 | 1.1005 | 41134144 |
0.2085 | 0.7891 | 750 | 1.1031 | 41419848 |
0.225 | 0.7943 | 755 | 1.1021 | 41697200 |
0.1795 | 0.7996 | 760 | 1.1003 | 41967792 |
0.1678 | 0.8048 | 765 | 1.1021 | 42248480 |
0.2077 | 0.8101 | 770 | 1.1038 | 42526440 |
0.1804 | 0.8154 | 775 | 1.1020 | 42800368 |
0.2173 | 0.8206 | 780 | 1.1013 | 43074768 |
0.1318 | 0.8259 | 785 | 1.1018 | 43354056 |
0.1594 | 0.8311 | 790 | 1.1026 | 43625216 |
0.14 | 0.8364 | 795 | 1.1018 | 43903216 |
0.233 | 0.8417 | 800 | 1.1012 | 44176600 |
0.1285 | 0.8469 | 805 | 1.1021 | 44445584 |
0.1964 | 0.8522 | 810 | 1.1018 | 44716864 |
0.2689 | 0.8574 | 815 | 1.1002 | 44996272 |
0.178 | 0.8627 | 820 | 1.1011 | 45269384 |
0.1888 | 0.8680 | 825 | 1.1017 | 45547368 |
0.1737 | 0.8732 | 830 | 1.1005 | 45825120 |
0.1233 | 0.8785 | 835 | 1.1013 | 46097072 |
0.0937 | 0.8837 | 840 | 1.1011 | 46373416 |
0.184 | 0.8890 | 845 | 1.0997 | 46647664 |
0.1383 | 0.8943 | 850 | 1.0989 | 46917064 |
0.1472 | 0.8995 | 855 | 1.0993 | 47193720 |
0.2097 | 0.9048 | 860 | 1.0991 | 47464080 |
0.1544 | 0.9100 | 865 | 1.0979 | 47739856 |
0.192 | 0.9153 | 870 | 1.0972 | 48012640 |
0.2038 | 0.9206 | 875 | 1.0992 | 48292896 |
0.195 | 0.9258 | 880 | 1.0985 | 48570744 |
0.1714 | 0.9311 | 885 | 1.1002 | 48839696 |
0.2157 | 0.9363 | 890 | 1.0991 | 49113832 |
0.1883 | 0.9416 | 895 | 1.0996 | 49386096 |
0.1905 | 0.9469 | 900 | 1.0990 | 49660576 |
0.1996 | 0.9521 | 905 | 1.0976 | 49929712 |
0.2138 | 0.9574 | 910 | 1.0970 | 50209424 |
0.2167 | 0.9627 | 915 | 1.0980 | 50487592 |
0.235 | 0.9679 | 920 | 1.0982 | 50763456 |
0.1239 | 0.9732 | 925 | 1.0944 | 51038992 |
0.1356 | 0.9784 | 930 | 1.0955 | 51321392 |
0.235 | 0.9837 | 935 | 1.0977 | 51594960 |
0.2103 | 0.9890 | 940 | 1.0973 | 51865504 |
0.1803 | 0.9942 | 945 | 1.0953 | 52139024 |
0.1847 | 0.9995 | 950 | 1.0973 | 52420976 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd0
Base model
google/gemma-2-2b