collapse_gemma-2-2b_hs2_accumulate_iter18_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0991
- Num Input Tokens Seen: 94199936
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6355 | 0.0029 | 5 | 1.3902 | 273080 |
1.755 | 0.0058 | 10 | 1.3834 | 542536 |
1.6008 | 0.0087 | 15 | 1.3608 | 818160 |
1.5095 | 0.0115 | 20 | 1.3302 | 1087712 |
1.4931 | 0.0144 | 25 | 1.2868 | 1358120 |
1.5246 | 0.0173 | 30 | 1.2495 | 1634552 |
1.3609 | 0.0202 | 35 | 1.2260 | 1907576 |
1.2894 | 0.0231 | 40 | 1.2007 | 2185408 |
1.1611 | 0.0260 | 45 | 1.2006 | 2469512 |
1.0638 | 0.0289 | 50 | 1.2316 | 2741152 |
0.9381 | 0.0318 | 55 | 1.2512 | 3017712 |
0.7196 | 0.0346 | 60 | 1.3089 | 3290624 |
0.6478 | 0.0375 | 65 | 1.3093 | 3569584 |
0.5522 | 0.0404 | 70 | 1.3568 | 3839864 |
0.4234 | 0.0433 | 75 | 1.3601 | 4118176 |
0.3744 | 0.0462 | 80 | 1.3157 | 4391840 |
0.3335 | 0.0491 | 85 | 1.2875 | 4657960 |
0.3203 | 0.0520 | 90 | 1.2567 | 4931880 |
0.2661 | 0.0548 | 95 | 1.2383 | 5203336 |
0.2661 | 0.0577 | 100 | 1.2377 | 5479928 |
0.1948 | 0.0606 | 105 | 1.2305 | 5745288 |
0.1807 | 0.0635 | 110 | 1.2309 | 6013448 |
0.136 | 0.0664 | 115 | 1.2181 | 6286392 |
0.2557 | 0.0693 | 120 | 1.2153 | 6558496 |
0.2068 | 0.0722 | 125 | 1.2173 | 6829496 |
0.1826 | 0.0751 | 130 | 1.2073 | 7098952 |
0.1489 | 0.0779 | 135 | 1.2175 | 7372160 |
0.1658 | 0.0808 | 140 | 1.2065 | 7636184 |
0.1707 | 0.0837 | 145 | 1.2045 | 7905856 |
0.2067 | 0.0866 | 150 | 1.2085 | 8181320 |
0.1845 | 0.0895 | 155 | 1.1919 | 8449704 |
0.2256 | 0.0924 | 160 | 1.2007 | 8722656 |
0.2229 | 0.0953 | 165 | 1.1901 | 9000920 |
0.1335 | 0.0981 | 170 | 1.1879 | 9265336 |
0.1938 | 0.1010 | 175 | 1.1903 | 9534592 |
0.167 | 0.1039 | 180 | 1.1901 | 9814080 |
0.2432 | 0.1068 | 185 | 1.1879 | 10087384 |
0.2269 | 0.1097 | 190 | 1.1778 | 10360840 |
0.1921 | 0.1126 | 195 | 1.1757 | 10632360 |
0.1659 | 0.1155 | 200 | 1.1771 | 10908448 |
0.1077 | 0.1184 | 205 | 1.1758 | 11177752 |
0.1651 | 0.1212 | 210 | 1.1740 | 11459776 |
0.1285 | 0.1241 | 215 | 1.1748 | 11727416 |
0.0836 | 0.1270 | 220 | 1.1723 | 12002160 |
0.1266 | 0.1299 | 225 | 1.1744 | 12278920 |
0.207 | 0.1328 | 230 | 1.1708 | 12551224 |
0.0917 | 0.1357 | 235 | 1.1705 | 12818864 |
0.1507 | 0.1386 | 240 | 1.1688 | 13091040 |
0.0746 | 0.1414 | 245 | 1.1664 | 13368416 |
0.1802 | 0.1443 | 250 | 1.1715 | 13642424 |
0.1542 | 0.1472 | 255 | 1.1657 | 13913584 |
0.1366 | 0.1501 | 260 | 1.1702 | 14173048 |
0.0992 | 0.1530 | 265 | 1.1653 | 14439904 |
0.1504 | 0.1559 | 270 | 1.1612 | 14710064 |
0.1084 | 0.1588 | 275 | 1.1631 | 14982448 |
0.1535 | 0.1617 | 280 | 1.1613 | 15261368 |
0.1586 | 0.1645 | 285 | 1.1568 | 15539080 |
0.1232 | 0.1674 | 290 | 1.1599 | 15815408 |
0.1489 | 0.1703 | 295 | 1.1597 | 16081000 |
0.1838 | 0.1732 | 300 | 1.1572 | 16353832 |
0.1338 | 0.1761 | 305 | 1.1574 | 16627816 |
0.1256 | 0.1790 | 310 | 1.1583 | 16898624 |
0.1534 | 0.1819 | 315 | 1.1554 | 17170712 |
0.1303 | 0.1848 | 320 | 1.1527 | 17442200 |
0.1247 | 0.1876 | 325 | 1.1535 | 17714848 |
0.1979 | 0.1905 | 330 | 1.1568 | 17988648 |
0.1534 | 0.1934 | 335 | 1.1530 | 18256560 |
0.1689 | 0.1963 | 340 | 1.1505 | 18529264 |
0.0983 | 0.1992 | 345 | 1.1536 | 18795120 |
0.1733 | 0.2021 | 350 | 1.1531 | 19063704 |
0.1345 | 0.2050 | 355 | 1.1524 | 19336152 |
0.148 | 0.2078 | 360 | 1.1502 | 19602176 |
0.1819 | 0.2107 | 365 | 1.1519 | 19870064 |
0.1622 | 0.2136 | 370 | 1.1520 | 20137704 |
0.1836 | 0.2165 | 375 | 1.1478 | 20409336 |
0.1113 | 0.2194 | 380 | 1.1481 | 20680232 |
0.1638 | 0.2223 | 385 | 1.1472 | 20947952 |
0.0796 | 0.2252 | 390 | 1.1452 | 21224104 |
0.132 | 0.2281 | 395 | 1.1450 | 21498344 |
0.1493 | 0.2309 | 400 | 1.1423 | 21766440 |
0.1671 | 0.2338 | 405 | 1.1466 | 22038592 |
0.1362 | 0.2367 | 410 | 1.1458 | 22310032 |
0.1554 | 0.2396 | 415 | 1.1408 | 22574200 |
0.1661 | 0.2425 | 420 | 1.1458 | 22852416 |
0.1257 | 0.2454 | 425 | 1.1440 | 23118680 |
0.1821 | 0.2483 | 430 | 1.1413 | 23392104 |
0.1686 | 0.2511 | 435 | 1.1443 | 23663896 |
0.1713 | 0.2540 | 440 | 1.1423 | 23935672 |
0.0865 | 0.2569 | 445 | 1.1393 | 24205112 |
0.1383 | 0.2598 | 450 | 1.1400 | 24476968 |
0.1853 | 0.2627 | 455 | 1.1365 | 24752648 |
0.2005 | 0.2656 | 460 | 1.1380 | 25025680 |
0.1473 | 0.2685 | 465 | 1.1385 | 25301192 |
0.1351 | 0.2714 | 470 | 1.1392 | 25568272 |
0.1366 | 0.2742 | 475 | 1.1382 | 25843072 |
0.1238 | 0.2771 | 480 | 1.1374 | 26112888 |
0.1935 | 0.2800 | 485 | 1.1408 | 26379648 |
0.1487 | 0.2829 | 490 | 1.1386 | 26650304 |
0.139 | 0.2858 | 495 | 1.1367 | 26926528 |
0.132 | 0.2887 | 500 | 1.1375 | 27202536 |
0.1939 | 0.2916 | 505 | 1.1366 | 27467176 |
0.1675 | 0.2944 | 510 | 1.1336 | 27737496 |
0.1004 | 0.2973 | 515 | 1.1377 | 28011560 |
0.1701 | 0.3002 | 520 | 1.1402 | 28282360 |
0.1752 | 0.3031 | 525 | 1.1352 | 28553656 |
0.0854 | 0.3060 | 530 | 1.1372 | 28830408 |
0.1186 | 0.3089 | 535 | 1.1387 | 29101672 |
0.1913 | 0.3118 | 540 | 1.1369 | 29369816 |
0.1433 | 0.3147 | 545 | 1.1354 | 29643664 |
0.1252 | 0.3175 | 550 | 1.1334 | 29919784 |
0.1886 | 0.3204 | 555 | 1.1336 | 30193360 |
0.1727 | 0.3233 | 560 | 1.1318 | 30466320 |
0.109 | 0.3262 | 565 | 1.1320 | 30736080 |
0.0938 | 0.3291 | 570 | 1.1335 | 31013624 |
0.1486 | 0.3320 | 575 | 1.1310 | 31285184 |
0.1095 | 0.3349 | 580 | 1.1321 | 31556888 |
0.0856 | 0.3377 | 585 | 1.1321 | 31830664 |
0.1818 | 0.3406 | 590 | 1.1324 | 32101880 |
0.1388 | 0.3435 | 595 | 1.1328 | 32367056 |
0.107 | 0.3464 | 600 | 1.1314 | 32643944 |
0.1435 | 0.3493 | 605 | 1.1305 | 32912048 |
0.1034 | 0.3522 | 610 | 1.1322 | 33179720 |
0.1244 | 0.3551 | 615 | 1.1318 | 33447856 |
0.084 | 0.3580 | 620 | 1.1321 | 33720904 |
0.0899 | 0.3608 | 625 | 1.1340 | 33995128 |
0.1698 | 0.3637 | 630 | 1.1293 | 34261264 |
0.1626 | 0.3666 | 635 | 1.1311 | 34542304 |
0.0928 | 0.3695 | 640 | 1.1311 | 34811120 |
0.1622 | 0.3724 | 645 | 1.1294 | 35084560 |
0.1449 | 0.3753 | 650 | 1.1283 | 35362872 |
0.1182 | 0.3782 | 655 | 1.1286 | 35634400 |
0.1374 | 0.3810 | 660 | 1.1299 | 35908032 |
0.1323 | 0.3839 | 665 | 1.1275 | 36176632 |
0.0872 | 0.3868 | 670 | 1.1266 | 36452520 |
0.1128 | 0.3897 | 675 | 1.1281 | 36723776 |
0.1375 | 0.3926 | 680 | 1.1282 | 36987536 |
0.1568 | 0.3955 | 685 | 1.1257 | 37255400 |
0.1167 | 0.3984 | 690 | 1.1246 | 37516976 |
0.0931 | 0.4013 | 695 | 1.1252 | 37793008 |
0.1485 | 0.4041 | 700 | 1.1239 | 38064944 |
0.0646 | 0.4070 | 705 | 1.1244 | 38340472 |
0.0991 | 0.4099 | 710 | 1.1242 | 38616912 |
0.1298 | 0.4128 | 715 | 1.1227 | 38889552 |
0.0944 | 0.4157 | 720 | 1.1222 | 39159128 |
0.1194 | 0.4186 | 725 | 1.1233 | 39434984 |
0.1733 | 0.4215 | 730 | 1.1248 | 39705592 |
0.078 | 0.4243 | 735 | 1.1245 | 39977616 |
0.1721 | 0.4272 | 740 | 1.1223 | 40248928 |
0.1472 | 0.4301 | 745 | 1.1232 | 40522392 |
0.1828 | 0.4330 | 750 | 1.1231 | 40790160 |
0.1091 | 0.4359 | 755 | 1.1231 | 41060672 |
0.082 | 0.4388 | 760 | 1.1242 | 41329400 |
0.0931 | 0.4417 | 765 | 1.1233 | 41593992 |
0.0927 | 0.4446 | 770 | 1.1206 | 41870296 |
0.0857 | 0.4474 | 775 | 1.1219 | 42138136 |
0.1329 | 0.4503 | 780 | 1.1237 | 42412088 |
0.114 | 0.4532 | 785 | 1.1243 | 42687760 |
0.2016 | 0.4561 | 790 | 1.1230 | 42959832 |
0.1015 | 0.4590 | 795 | 1.1201 | 43232792 |
0.1605 | 0.4619 | 800 | 1.1240 | 43508928 |
0.1178 | 0.4648 | 805 | 1.1230 | 43772784 |
0.1144 | 0.4677 | 810 | 1.1213 | 44040064 |
0.1925 | 0.4705 | 815 | 1.1208 | 44312680 |
0.0638 | 0.4734 | 820 | 1.1206 | 44584584 |
0.1636 | 0.4763 | 825 | 1.1193 | 44851312 |
0.1703 | 0.4792 | 830 | 1.1183 | 45117768 |
0.1535 | 0.4821 | 835 | 1.1203 | 45399240 |
0.1258 | 0.4850 | 840 | 1.1205 | 45676944 |
0.1289 | 0.4879 | 845 | 1.1201 | 45943800 |
0.1267 | 0.4907 | 850 | 1.1186 | 46216536 |
0.1514 | 0.4936 | 855 | 1.1170 | 46487544 |
0.2064 | 0.4965 | 860 | 1.1178 | 46762024 |
0.0942 | 0.4994 | 865 | 1.1189 | 47036080 |
0.1093 | 0.5023 | 870 | 1.1176 | 47313016 |
0.1225 | 0.5052 | 875 | 1.1168 | 47586776 |
0.1956 | 0.5081 | 880 | 1.1147 | 47864416 |
0.1474 | 0.5110 | 885 | 1.1145 | 48135912 |
0.09 | 0.5138 | 890 | 1.1168 | 48405688 |
0.1276 | 0.5167 | 895 | 1.1176 | 48686904 |
0.1107 | 0.5196 | 900 | 1.1161 | 48960408 |
0.1074 | 0.5225 | 905 | 1.1179 | 49232384 |
0.1366 | 0.5254 | 910 | 1.1171 | 49511504 |
0.168 | 0.5283 | 915 | 1.1153 | 49782192 |
0.0839 | 0.5312 | 920 | 1.1139 | 50054160 |
0.1319 | 0.5340 | 925 | 1.1151 | 50324008 |
0.1119 | 0.5369 | 930 | 1.1145 | 50596888 |
0.0616 | 0.5398 | 935 | 1.1168 | 50863008 |
0.1474 | 0.5427 | 940 | 1.1170 | 51144184 |
0.0941 | 0.5456 | 945 | 1.1131 | 51408872 |
0.0925 | 0.5485 | 950 | 1.1142 | 51690048 |
0.0995 | 0.5514 | 955 | 1.1141 | 51968736 |
0.0958 | 0.5543 | 960 | 1.1149 | 52239936 |
0.0957 | 0.5571 | 965 | 1.1153 | 52513392 |
0.1481 | 0.5600 | 970 | 1.1156 | 52788696 |
0.1307 | 0.5629 | 975 | 1.1127 | 53063368 |
0.1862 | 0.5658 | 980 | 1.1124 | 53344472 |
0.1411 | 0.5687 | 985 | 1.1122 | 53612672 |
0.107 | 0.5716 | 990 | 1.1116 | 53883608 |
0.125 | 0.5745 | 995 | 1.1137 | 54149752 |
0.1002 | 0.5773 | 1000 | 1.1141 | 54418520 |
0.1723 | 0.5802 | 1005 | 1.1139 | 54695576 |
0.142 | 0.5831 | 1010 | 1.1130 | 54969056 |
0.125 | 0.5860 | 1015 | 1.1115 | 55239504 |
0.088 | 0.5889 | 1020 | 1.1110 | 55506544 |
0.1416 | 0.5918 | 1025 | 1.1140 | 55772872 |
0.1486 | 0.5947 | 1030 | 1.1130 | 56043792 |
0.0943 | 0.5976 | 1035 | 1.1124 | 56298208 |
0.106 | 0.6004 | 1040 | 1.1126 | 56570952 |
0.0945 | 0.6033 | 1045 | 1.1120 | 56839744 |
0.1679 | 0.6062 | 1050 | 1.1101 | 57115480 |
0.0844 | 0.6091 | 1055 | 1.1101 | 57389552 |
0.1306 | 0.6120 | 1060 | 1.1127 | 57664336 |
0.1492 | 0.6149 | 1065 | 1.1104 | 57928656 |
0.136 | 0.6178 | 1070 | 1.1097 | 58200792 |
0.1486 | 0.6206 | 1075 | 1.1101 | 58477232 |
0.125 | 0.6235 | 1080 | 1.1117 | 58747232 |
0.1161 | 0.6264 | 1085 | 1.1124 | 59017776 |
0.1849 | 0.6293 | 1090 | 1.1104 | 59292744 |
0.1248 | 0.6322 | 1095 | 1.1099 | 59567504 |
0.0766 | 0.6351 | 1100 | 1.1118 | 59843480 |
0.1158 | 0.6380 | 1105 | 1.1143 | 60111432 |
0.0988 | 0.6409 | 1110 | 1.1097 | 60376880 |
0.1259 | 0.6437 | 1115 | 1.1088 | 60645976 |
0.1613 | 0.6466 | 1120 | 1.1115 | 60911728 |
0.1149 | 0.6495 | 1125 | 1.1130 | 61183048 |
0.0895 | 0.6524 | 1130 | 1.1095 | 61450376 |
0.0856 | 0.6553 | 1135 | 1.1084 | 61721024 |
0.1038 | 0.6582 | 1140 | 1.1101 | 61995312 |
0.101 | 0.6611 | 1145 | 1.1086 | 62262616 |
0.1122 | 0.6639 | 1150 | 1.1075 | 62537480 |
0.0925 | 0.6668 | 1155 | 1.1096 | 62805664 |
0.1259 | 0.6697 | 1160 | 1.1101 | 63072872 |
0.0946 | 0.6726 | 1165 | 1.1110 | 63342736 |
0.1586 | 0.6755 | 1170 | 1.1115 | 63622824 |
0.2225 | 0.6784 | 1175 | 1.1108 | 63887064 |
0.1598 | 0.6813 | 1180 | 1.1078 | 64157848 |
0.1078 | 0.6842 | 1185 | 1.1075 | 64430840 |
0.088 | 0.6870 | 1190 | 1.1092 | 64696064 |
0.0929 | 0.6899 | 1195 | 1.1095 | 64967784 |
0.0923 | 0.6928 | 1200 | 1.1097 | 65245096 |
0.1411 | 0.6957 | 1205 | 1.1108 | 65511632 |
0.1835 | 0.6986 | 1210 | 1.1098 | 65789744 |
0.1551 | 0.7015 | 1215 | 1.1086 | 66065976 |
0.1432 | 0.7044 | 1220 | 1.1077 | 66339784 |
0.0772 | 0.7072 | 1225 | 1.1084 | 66600760 |
0.1058 | 0.7101 | 1230 | 1.1093 | 66872128 |
0.1347 | 0.7130 | 1235 | 1.1073 | 67143616 |
0.101 | 0.7159 | 1240 | 1.1084 | 67413056 |
0.144 | 0.7188 | 1245 | 1.1086 | 67680720 |
0.0698 | 0.7217 | 1250 | 1.1077 | 67957920 |
0.1419 | 0.7246 | 1255 | 1.1073 | 68230784 |
0.126 | 0.7275 | 1260 | 1.1089 | 68505544 |
0.0983 | 0.7303 | 1265 | 1.1086 | 68783888 |
0.1015 | 0.7332 | 1270 | 1.1076 | 69058384 |
0.1706 | 0.7361 | 1275 | 1.1093 | 69330384 |
0.1076 | 0.7390 | 1280 | 1.1078 | 69607104 |
0.0914 | 0.7419 | 1285 | 1.1068 | 69877280 |
0.1405 | 0.7448 | 1290 | 1.1069 | 70152936 |
0.0788 | 0.7477 | 1295 | 1.1076 | 70415984 |
0.1205 | 0.7506 | 1300 | 1.1082 | 70694120 |
0.1431 | 0.7534 | 1305 | 1.1073 | 70975248 |
0.1176 | 0.7563 | 1310 | 1.1079 | 71244776 |
0.1231 | 0.7592 | 1315 | 1.1086 | 71515944 |
0.1701 | 0.7621 | 1320 | 1.1071 | 71788584 |
0.1153 | 0.7650 | 1325 | 1.1042 | 72058848 |
0.1552 | 0.7679 | 1330 | 1.1048 | 72328504 |
0.078 | 0.7708 | 1335 | 1.1048 | 72595560 |
0.1188 | 0.7736 | 1340 | 1.1061 | 72867048 |
0.188 | 0.7765 | 1345 | 1.1068 | 73136616 |
0.0786 | 0.7794 | 1350 | 1.1049 | 73408992 |
0.1313 | 0.7823 | 1355 | 1.1048 | 73681688 |
0.1277 | 0.7852 | 1360 | 1.1054 | 73951200 |
0.1041 | 0.7881 | 1365 | 1.1046 | 74227400 |
0.0774 | 0.7910 | 1370 | 1.1022 | 74509208 |
0.1395 | 0.7939 | 1375 | 1.1041 | 74785064 |
0.1818 | 0.7967 | 1380 | 1.1044 | 75054672 |
0.0816 | 0.7996 | 1385 | 1.1037 | 75330504 |
0.1471 | 0.8025 | 1390 | 1.1034 | 75604224 |
0.0972 | 0.8054 | 1395 | 1.1043 | 75885624 |
0.1165 | 0.8083 | 1400 | 1.1043 | 76157960 |
0.0935 | 0.8112 | 1405 | 1.1054 | 76436896 |
0.1022 | 0.8141 | 1410 | 1.1052 | 76704048 |
0.0851 | 0.8169 | 1415 | 1.1030 | 76978120 |
0.0593 | 0.8198 | 1420 | 1.1016 | 77250288 |
0.0669 | 0.8227 | 1425 | 1.1044 | 77519352 |
0.1244 | 0.8256 | 1430 | 1.1049 | 77790112 |
0.1159 | 0.8285 | 1435 | 1.1041 | 78056792 |
0.1519 | 0.8314 | 1440 | 1.1022 | 78328576 |
0.113 | 0.8343 | 1445 | 1.1025 | 78594616 |
0.1675 | 0.8372 | 1450 | 1.1037 | 78861016 |
0.0944 | 0.8400 | 1455 | 1.1045 | 79132448 |
0.1323 | 0.8429 | 1460 | 1.1055 | 79413096 |
0.1032 | 0.8458 | 1465 | 1.1016 | 79683616 |
0.0962 | 0.8487 | 1470 | 1.0996 | 79956632 |
0.1486 | 0.8516 | 1475 | 1.1004 | 80230200 |
0.1277 | 0.8545 | 1480 | 1.1038 | 80500928 |
0.1144 | 0.8574 | 1485 | 1.1044 | 80772320 |
0.0917 | 0.8602 | 1490 | 1.1013 | 81045776 |
0.0878 | 0.8631 | 1495 | 1.1019 | 81318224 |
0.0899 | 0.8660 | 1500 | 1.1031 | 81595424 |
0.1167 | 0.8689 | 1505 | 1.1033 | 81865824 |
0.1375 | 0.8718 | 1510 | 1.1024 | 82141712 |
0.11 | 0.8747 | 1515 | 1.1034 | 82412744 |
0.1259 | 0.8776 | 1520 | 1.1055 | 82675128 |
0.1496 | 0.8805 | 1525 | 1.1027 | 82946504 |
0.1201 | 0.8833 | 1530 | 1.1018 | 83211536 |
0.0927 | 0.8862 | 1535 | 1.1016 | 83486808 |
0.1379 | 0.8891 | 1540 | 1.1025 | 83760872 |
0.1235 | 0.8920 | 1545 | 1.1024 | 84037256 |
0.1699 | 0.8949 | 1550 | 1.1014 | 84302864 |
0.0979 | 0.8978 | 1555 | 1.1016 | 84572624 |
0.1089 | 0.9007 | 1560 | 1.1008 | 84848136 |
0.0964 | 0.9035 | 1565 | 1.1018 | 85122688 |
0.1252 | 0.9064 | 1570 | 1.1028 | 85397560 |
0.1109 | 0.9093 | 1575 | 1.1016 | 85662296 |
0.1039 | 0.9122 | 1580 | 1.1019 | 85936832 |
0.0778 | 0.9151 | 1585 | 1.1016 | 86212792 |
0.1588 | 0.9180 | 1590 | 1.1008 | 86478736 |
0.0962 | 0.9209 | 1595 | 1.1010 | 86748992 |
0.0739 | 0.9238 | 1600 | 1.1020 | 87021920 |
0.1002 | 0.9266 | 1605 | 1.1017 | 87288672 |
0.1132 | 0.9295 | 1610 | 1.1000 | 87562104 |
0.1577 | 0.9324 | 1615 | 1.1013 | 87836816 |
0.1108 | 0.9353 | 1620 | 1.1023 | 88104552 |
0.1142 | 0.9382 | 1625 | 1.1018 | 88377952 |
0.0985 | 0.9411 | 1630 | 1.0987 | 88656104 |
0.1196 | 0.9440 | 1635 | 1.0979 | 88926416 |
0.104 | 0.9468 | 1640 | 1.0999 | 89194592 |
0.1192 | 0.9497 | 1645 | 1.1013 | 89471232 |
0.1084 | 0.9526 | 1650 | 1.1004 | 89740088 |
0.1354 | 0.9555 | 1655 | 1.0985 | 90013160 |
0.0883 | 0.9584 | 1660 | 1.0989 | 90279560 |
0.1745 | 0.9613 | 1665 | 1.1010 | 90553904 |
0.0753 | 0.9642 | 1670 | 1.1037 | 90822296 |
0.0855 | 0.9671 | 1675 | 1.1016 | 91093480 |
0.0871 | 0.9699 | 1680 | 1.0992 | 91361872 |
0.1241 | 0.9728 | 1685 | 1.0996 | 91631912 |
0.1119 | 0.9757 | 1690 | 1.1000 | 91915312 |
0.1286 | 0.9786 | 1695 | 1.0989 | 92188240 |
0.1097 | 0.9815 | 1700 | 1.0996 | 92457072 |
0.1334 | 0.9844 | 1705 | 1.1015 | 92721984 |
0.1678 | 0.9873 | 1710 | 1.1010 | 92999640 |
0.1268 | 0.9901 | 1715 | 1.0983 | 93274432 |
0.0892 | 0.9930 | 1720 | 1.0970 | 93543056 |
0.0895 | 0.9959 | 1725 | 1.0983 | 93820760 |
0.1491 | 0.9988 | 1730 | 1.0996 | 94090496 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 13
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter18_sftsd0
Base model
google/gemma-2-2b