collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0945
- Num Input Tokens Seen: 51175712
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6385 | 0.0052 | 5 | 1.3879 | 271624 |
1.6433 | 0.0105 | 10 | 1.3632 | 531728 |
1.5049 | 0.0157 | 15 | 1.3069 | 799224 |
1.4197 | 0.0210 | 20 | 1.2564 | 1066744 |
1.3092 | 0.0262 | 25 | 1.2152 | 1340184 |
1.2077 | 0.0315 | 30 | 1.1824 | 1611416 |
1.0873 | 0.0367 | 35 | 1.1982 | 1861336 |
1.0458 | 0.0420 | 40 | 1.2086 | 2129696 |
0.8771 | 0.0472 | 45 | 1.2089 | 2398856 |
0.6921 | 0.0525 | 50 | 1.2765 | 2665824 |
0.5044 | 0.0577 | 55 | 1.2621 | 2935368 |
0.5965 | 0.0630 | 60 | 1.2718 | 3202496 |
0.4228 | 0.0682 | 65 | 1.2536 | 3470784 |
0.3887 | 0.0734 | 70 | 1.2335 | 3734424 |
0.3822 | 0.0787 | 75 | 1.2310 | 3998152 |
0.4025 | 0.0839 | 80 | 1.2048 | 4272168 |
0.3132 | 0.0892 | 85 | 1.2041 | 4536200 |
0.3385 | 0.0944 | 90 | 1.2099 | 4799144 |
0.2833 | 0.0997 | 95 | 1.1906 | 5072656 |
0.2796 | 0.1049 | 100 | 1.1919 | 5344504 |
0.1858 | 0.1102 | 105 | 1.1813 | 5610600 |
0.249 | 0.1154 | 110 | 1.1853 | 5878120 |
0.2275 | 0.1207 | 115 | 1.1839 | 6143552 |
0.2511 | 0.1259 | 120 | 1.1824 | 6413392 |
0.3556 | 0.1312 | 125 | 1.1811 | 6680192 |
0.176 | 0.1364 | 130 | 1.1737 | 6941568 |
0.2581 | 0.1416 | 135 | 1.1701 | 7205544 |
0.222 | 0.1469 | 140 | 1.1711 | 7480072 |
0.2517 | 0.1521 | 145 | 1.1659 | 7750744 |
0.2425 | 0.1574 | 150 | 1.1641 | 8022208 |
0.2457 | 0.1626 | 155 | 1.1649 | 8296160 |
0.2867 | 0.1679 | 160 | 1.1597 | 8569848 |
0.1405 | 0.1731 | 165 | 1.1626 | 8833768 |
0.2254 | 0.1784 | 170 | 1.1618 | 9101328 |
0.2241 | 0.1836 | 175 | 1.1544 | 9368632 |
0.2379 | 0.1889 | 180 | 1.1580 | 9636496 |
0.2245 | 0.1941 | 185 | 1.1540 | 9900576 |
0.2203 | 0.1994 | 190 | 1.1510 | 10169840 |
0.2859 | 0.2046 | 195 | 1.1524 | 10443184 |
0.208 | 0.2098 | 200 | 1.1504 | 10715800 |
0.2657 | 0.2151 | 205 | 1.1489 | 10982672 |
0.1606 | 0.2203 | 210 | 1.1471 | 11257472 |
0.1658 | 0.2256 | 215 | 1.1481 | 11522464 |
0.2363 | 0.2308 | 220 | 1.1469 | 11787120 |
0.1589 | 0.2361 | 225 | 1.1472 | 12053088 |
0.1843 | 0.2413 | 230 | 1.1456 | 12329248 |
0.2811 | 0.2466 | 235 | 1.1443 | 12596816 |
0.2504 | 0.2518 | 240 | 1.1441 | 12865736 |
0.2208 | 0.2571 | 245 | 1.1416 | 13136632 |
0.219 | 0.2623 | 250 | 1.1414 | 13398592 |
0.2519 | 0.2676 | 255 | 1.1409 | 13673896 |
0.1821 | 0.2728 | 260 | 1.1376 | 13942448 |
0.1376 | 0.2781 | 265 | 1.1420 | 14210040 |
0.2355 | 0.2833 | 270 | 1.1373 | 14479896 |
0.2076 | 0.2885 | 275 | 1.1361 | 14751016 |
0.1938 | 0.2938 | 280 | 1.1406 | 15021448 |
0.2384 | 0.2990 | 285 | 1.1335 | 15280872 |
0.2672 | 0.3043 | 290 | 1.1346 | 15543056 |
0.211 | 0.3095 | 295 | 1.1354 | 15810904 |
0.2775 | 0.3148 | 300 | 1.1331 | 16080016 |
0.126 | 0.3200 | 305 | 1.1321 | 16353688 |
0.2124 | 0.3253 | 310 | 1.1323 | 16626304 |
0.2067 | 0.3305 | 315 | 1.1290 | 16891864 |
0.223 | 0.3358 | 320 | 1.1309 | 17161824 |
0.219 | 0.3410 | 325 | 1.1325 | 17432392 |
0.1981 | 0.3463 | 330 | 1.1281 | 17702632 |
0.1413 | 0.3515 | 335 | 1.1288 | 17975384 |
0.1306 | 0.3567 | 340 | 1.1287 | 18249784 |
0.2086 | 0.3620 | 345 | 1.1287 | 18513992 |
0.2131 | 0.3672 | 350 | 1.1257 | 18785208 |
0.2322 | 0.3725 | 355 | 1.1279 | 19057760 |
0.193 | 0.3777 | 360 | 1.1274 | 19326416 |
0.2152 | 0.3830 | 365 | 1.1256 | 19589776 |
0.1853 | 0.3882 | 370 | 1.1229 | 19859024 |
0.152 | 0.3935 | 375 | 1.1260 | 20127728 |
0.2626 | 0.3987 | 380 | 1.1228 | 20399736 |
0.2866 | 0.4040 | 385 | 1.1207 | 20671496 |
0.2188 | 0.4092 | 390 | 1.1238 | 20944784 |
0.2403 | 0.4145 | 395 | 1.1215 | 21213824 |
0.2303 | 0.4197 | 400 | 1.1219 | 21485816 |
0.2451 | 0.4249 | 405 | 1.1208 | 21759368 |
0.1682 | 0.4302 | 410 | 1.1191 | 22030608 |
0.1945 | 0.4354 | 415 | 1.1202 | 22302928 |
0.2122 | 0.4407 | 420 | 1.1206 | 22567912 |
0.2038 | 0.4459 | 425 | 1.1179 | 22839344 |
0.1775 | 0.4512 | 430 | 1.1189 | 23110192 |
0.248 | 0.4564 | 435 | 1.1186 | 23385984 |
0.1564 | 0.4617 | 440 | 1.1176 | 23656368 |
0.2442 | 0.4669 | 445 | 1.1205 | 23925760 |
0.1851 | 0.4722 | 450 | 1.1180 | 24192416 |
0.2148 | 0.4774 | 455 | 1.1164 | 24455504 |
0.1515 | 0.4827 | 460 | 1.1170 | 24721184 |
0.1828 | 0.4879 | 465 | 1.1174 | 24990064 |
0.2011 | 0.4931 | 470 | 1.1166 | 25255856 |
0.2027 | 0.4984 | 475 | 1.1164 | 25523776 |
0.1516 | 0.5036 | 480 | 1.1150 | 25790296 |
0.2105 | 0.5089 | 485 | 1.1148 | 26052616 |
0.1914 | 0.5141 | 490 | 1.1129 | 26319264 |
0.2359 | 0.5194 | 495 | 1.1137 | 26593128 |
0.1381 | 0.5246 | 500 | 1.1161 | 26862440 |
0.1915 | 0.5299 | 505 | 1.1142 | 27123760 |
0.1205 | 0.5351 | 510 | 1.1135 | 27392640 |
0.2322 | 0.5404 | 515 | 1.1137 | 27664784 |
0.151 | 0.5456 | 520 | 1.1116 | 27935984 |
0.2365 | 0.5509 | 525 | 1.1115 | 28211288 |
0.2168 | 0.5561 | 530 | 1.1144 | 28477568 |
0.1178 | 0.5613 | 535 | 1.1119 | 28742552 |
0.2171 | 0.5666 | 540 | 1.1114 | 29017040 |
0.104 | 0.5718 | 545 | 1.1124 | 29287360 |
0.2219 | 0.5771 | 550 | 1.1115 | 29554808 |
0.2235 | 0.5823 | 555 | 1.1098 | 29820936 |
0.2177 | 0.5876 | 560 | 1.1099 | 30088000 |
0.176 | 0.5928 | 565 | 1.1100 | 30349872 |
0.2121 | 0.5981 | 570 | 1.1088 | 30615816 |
0.2045 | 0.6033 | 575 | 1.1084 | 30880216 |
0.267 | 0.6086 | 580 | 1.1119 | 31144872 |
0.1728 | 0.6138 | 585 | 1.1094 | 31411192 |
0.1475 | 0.6191 | 590 | 1.1059 | 31675568 |
0.2079 | 0.6243 | 595 | 1.1088 | 31946312 |
0.2596 | 0.6295 | 600 | 1.1085 | 32220528 |
0.1331 | 0.6348 | 605 | 1.1074 | 32485712 |
0.2242 | 0.6400 | 610 | 1.1078 | 32752832 |
0.1945 | 0.6453 | 615 | 1.1072 | 33018800 |
0.1944 | 0.6505 | 620 | 1.1043 | 33286032 |
0.1981 | 0.6558 | 625 | 1.1058 | 33559320 |
0.2431 | 0.6610 | 630 | 1.1069 | 33827288 |
0.2074 | 0.6663 | 635 | 1.1044 | 34093824 |
0.1961 | 0.6715 | 640 | 1.1054 | 34358032 |
0.1657 | 0.6768 | 645 | 1.1067 | 34625840 |
0.1148 | 0.6820 | 650 | 1.1059 | 34887960 |
0.2367 | 0.6873 | 655 | 1.1055 | 35159816 |
0.2539 | 0.6925 | 660 | 1.1056 | 35427320 |
0.1738 | 0.6978 | 665 | 1.1064 | 35700320 |
0.158 | 0.7030 | 670 | 1.1057 | 35964016 |
0.1366 | 0.7082 | 675 | 1.1048 | 36235568 |
0.2311 | 0.7135 | 680 | 1.1053 | 36507520 |
0.1222 | 0.7187 | 685 | 1.1042 | 36772320 |
0.1399 | 0.7240 | 690 | 1.1031 | 37040632 |
0.172 | 0.7292 | 695 | 1.1030 | 37303152 |
0.2098 | 0.7345 | 700 | 1.1059 | 37574576 |
0.1788 | 0.7397 | 705 | 1.1047 | 37848808 |
0.1323 | 0.7450 | 710 | 1.1021 | 38114488 |
0.2065 | 0.7502 | 715 | 1.1008 | 38388584 |
0.1683 | 0.7555 | 720 | 1.1033 | 38657616 |
0.2276 | 0.7607 | 725 | 1.1036 | 38926072 |
0.2007 | 0.7660 | 730 | 1.1019 | 39197256 |
0.196 | 0.7712 | 735 | 1.1004 | 39466864 |
0.1794 | 0.7764 | 740 | 1.1041 | 39737096 |
0.1614 | 0.7817 | 745 | 1.1046 | 40005096 |
0.2611 | 0.7869 | 750 | 1.1013 | 40271312 |
0.1707 | 0.7922 | 755 | 1.1014 | 40537096 |
0.1234 | 0.7974 | 760 | 1.1021 | 40798272 |
0.1902 | 0.8027 | 765 | 1.1026 | 41068576 |
0.2074 | 0.8079 | 770 | 1.1006 | 41333440 |
0.1535 | 0.8132 | 775 | 1.1004 | 41596272 |
0.2085 | 0.8184 | 780 | 1.1006 | 41867760 |
0.1914 | 0.8237 | 785 | 1.1007 | 42135872 |
0.1402 | 0.8289 | 790 | 1.1004 | 42405584 |
0.1844 | 0.8342 | 795 | 1.1001 | 42668992 |
0.2101 | 0.8394 | 800 | 1.0976 | 42936872 |
0.1892 | 0.8446 | 805 | 1.0993 | 43203248 |
0.2207 | 0.8499 | 810 | 1.1008 | 43470648 |
0.1441 | 0.8551 | 815 | 1.0994 | 43739272 |
0.146 | 0.8604 | 820 | 1.0985 | 44009920 |
0.1725 | 0.8656 | 825 | 1.0992 | 44274912 |
0.1492 | 0.8709 | 830 | 1.1002 | 44546640 |
0.2031 | 0.8761 | 835 | 1.0984 | 44810120 |
0.2081 | 0.8814 | 840 | 1.0982 | 45079088 |
0.1331 | 0.8866 | 845 | 1.0996 | 45351432 |
0.1989 | 0.8919 | 850 | 1.0978 | 45611400 |
0.1079 | 0.8971 | 855 | 1.0967 | 45874904 |
0.2258 | 0.9024 | 860 | 1.0979 | 46145128 |
0.1287 | 0.9076 | 865 | 1.0974 | 46410800 |
0.1404 | 0.9128 | 870 | 1.0974 | 46678552 |
0.1972 | 0.9181 | 875 | 1.0967 | 46939000 |
0.2395 | 0.9233 | 880 | 1.0958 | 47221520 |
0.1464 | 0.9286 | 885 | 1.0970 | 47499040 |
0.1881 | 0.9338 | 890 | 1.0965 | 47765808 |
0.1543 | 0.9391 | 895 | 1.0971 | 48035152 |
0.1311 | 0.9443 | 900 | 1.0966 | 48303032 |
0.1793 | 0.9496 | 905 | 1.0966 | 48574536 |
0.1552 | 0.9548 | 910 | 1.0959 | 48856360 |
0.1798 | 0.9601 | 915 | 1.0976 | 49126944 |
0.1749 | 0.9653 | 920 | 1.0967 | 49397832 |
0.157 | 0.9706 | 925 | 1.0939 | 49671648 |
0.1835 | 0.9758 | 930 | 1.0943 | 49936592 |
0.2019 | 0.9810 | 935 | 1.0973 | 50203752 |
0.1426 | 0.9863 | 940 | 1.0959 | 50476704 |
0.132 | 0.9915 | 945 | 1.0961 | 50742304 |
0.2386 | 0.9968 | 950 | 1.0962 | 51013336 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd1
Base model
google/gemma-2-2b