collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1042
- Num Input Tokens Seen: 46326936
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6395 | 0.0058 | 5 | 1.3873 | 273232 |
1.5382 | 0.0117 | 10 | 1.3596 | 549536 |
1.413 | 0.0175 | 15 | 1.2959 | 816480 |
1.3451 | 0.0234 | 20 | 1.2506 | 1082216 |
1.3264 | 0.0292 | 25 | 1.2060 | 1358568 |
1.1805 | 0.0351 | 30 | 1.1856 | 1631248 |
1.0751 | 0.0409 | 35 | 1.2012 | 1912408 |
1.0581 | 0.0468 | 40 | 1.1949 | 2174600 |
0.8935 | 0.0526 | 45 | 1.2430 | 2444168 |
0.71 | 0.0585 | 50 | 1.2432 | 2715336 |
0.7176 | 0.0643 | 55 | 1.2261 | 2992344 |
0.5526 | 0.0702 | 60 | 1.2337 | 3265936 |
0.5511 | 0.0760 | 65 | 1.2408 | 3533200 |
0.4296 | 0.0818 | 70 | 1.2197 | 3807592 |
0.5596 | 0.0877 | 75 | 1.2093 | 4082760 |
0.4868 | 0.0935 | 80 | 1.2132 | 4355472 |
0.514 | 0.0994 | 85 | 1.2082 | 4634896 |
0.3181 | 0.1052 | 90 | 1.2056 | 4904232 |
0.4753 | 0.1111 | 95 | 1.1970 | 5170560 |
0.4384 | 0.1169 | 100 | 1.1965 | 5439840 |
0.2906 | 0.1228 | 105 | 1.1971 | 5710376 |
0.3276 | 0.1286 | 110 | 1.1990 | 5973992 |
0.3077 | 0.1345 | 115 | 1.1929 | 6243696 |
0.3643 | 0.1403 | 120 | 1.1981 | 6518824 |
0.3255 | 0.1462 | 125 | 1.1841 | 6785592 |
0.3207 | 0.1520 | 130 | 1.1933 | 7049696 |
0.2859 | 0.1578 | 135 | 1.1838 | 7320952 |
0.2675 | 0.1637 | 140 | 1.1854 | 7589648 |
0.3272 | 0.1695 | 145 | 1.1846 | 7863136 |
0.3011 | 0.1754 | 150 | 1.1770 | 8141360 |
0.3251 | 0.1812 | 155 | 1.1775 | 8410752 |
0.2044 | 0.1871 | 160 | 1.1744 | 8695408 |
0.261 | 0.1929 | 165 | 1.1754 | 8963344 |
0.2521 | 0.1988 | 170 | 1.1767 | 9232056 |
0.27 | 0.2046 | 175 | 1.1732 | 9499280 |
0.2488 | 0.2105 | 180 | 1.1717 | 9768616 |
0.2241 | 0.2163 | 185 | 1.1694 | 10041184 |
0.3213 | 0.2222 | 190 | 1.1691 | 10315456 |
0.2564 | 0.2280 | 195 | 1.1664 | 10582696 |
0.1964 | 0.2338 | 200 | 1.1632 | 10848496 |
0.4058 | 0.2397 | 205 | 1.1655 | 11121432 |
0.1801 | 0.2455 | 210 | 1.1609 | 11386664 |
0.2493 | 0.2514 | 215 | 1.1610 | 11649672 |
0.2338 | 0.2572 | 220 | 1.1615 | 11914304 |
0.2439 | 0.2631 | 225 | 1.1546 | 12183456 |
0.2678 | 0.2689 | 230 | 1.1603 | 12459744 |
0.2016 | 0.2748 | 235 | 1.1552 | 12727384 |
0.2322 | 0.2806 | 240 | 1.1531 | 12997712 |
0.2601 | 0.2865 | 245 | 1.1556 | 13277552 |
0.2803 | 0.2923 | 250 | 1.1547 | 13549816 |
0.2706 | 0.2982 | 255 | 1.1527 | 13815352 |
0.2324 | 0.3040 | 260 | 1.1515 | 14087224 |
0.2559 | 0.3099 | 265 | 1.1466 | 14350816 |
0.3196 | 0.3157 | 270 | 1.1470 | 14618872 |
0.2293 | 0.3215 | 275 | 1.1461 | 14888432 |
0.1946 | 0.3274 | 280 | 1.1453 | 15155928 |
0.2246 | 0.3332 | 285 | 1.1448 | 15428952 |
0.2173 | 0.3391 | 290 | 1.1448 | 15700136 |
0.3516 | 0.3449 | 295 | 1.1424 | 15973904 |
0.2389 | 0.3508 | 300 | 1.1416 | 16245136 |
0.2712 | 0.3566 | 305 | 1.1437 | 16514928 |
0.2239 | 0.3625 | 310 | 1.1382 | 16786560 |
0.2022 | 0.3683 | 315 | 1.1410 | 17057120 |
0.2889 | 0.3742 | 320 | 1.1389 | 17330608 |
0.1889 | 0.3800 | 325 | 1.1351 | 17600464 |
0.1756 | 0.3859 | 330 | 1.1366 | 17870632 |
0.215 | 0.3917 | 335 | 1.1411 | 18144400 |
0.1786 | 0.3975 | 340 | 1.1331 | 18412704 |
0.192 | 0.4034 | 345 | 1.1365 | 18683256 |
0.2251 | 0.4092 | 350 | 1.1328 | 18956352 |
0.2512 | 0.4151 | 355 | 1.1331 | 19224576 |
0.2709 | 0.4209 | 360 | 1.1348 | 19500224 |
0.2109 | 0.4268 | 365 | 1.1312 | 19778440 |
0.2085 | 0.4326 | 370 | 1.1315 | 20047160 |
0.1839 | 0.4385 | 375 | 1.1349 | 20319104 |
0.3151 | 0.4443 | 380 | 1.1329 | 20586472 |
0.1758 | 0.4502 | 385 | 1.1318 | 20856480 |
0.1693 | 0.4560 | 390 | 1.1316 | 21128384 |
0.2961 | 0.4619 | 395 | 1.1287 | 21404032 |
0.2106 | 0.4677 | 400 | 1.1300 | 21672184 |
0.2677 | 0.4735 | 405 | 1.1269 | 21949112 |
0.2508 | 0.4794 | 410 | 1.1283 | 22229328 |
0.2052 | 0.4852 | 415 | 1.1281 | 22493280 |
0.1821 | 0.4911 | 420 | 1.1261 | 22766368 |
0.2182 | 0.4969 | 425 | 1.1278 | 23040832 |
0.2519 | 0.5028 | 430 | 1.1255 | 23308488 |
0.2477 | 0.5086 | 435 | 1.1261 | 23580888 |
0.1884 | 0.5145 | 440 | 1.1285 | 23859080 |
0.2363 | 0.5203 | 445 | 1.1245 | 24131976 |
0.2763 | 0.5262 | 450 | 1.1221 | 24400840 |
0.2422 | 0.5320 | 455 | 1.1247 | 24676752 |
0.2044 | 0.5379 | 460 | 1.1238 | 24943808 |
0.1768 | 0.5437 | 465 | 1.1232 | 25207288 |
0.167 | 0.5495 | 470 | 1.1246 | 25472192 |
0.2588 | 0.5554 | 475 | 1.1251 | 25747400 |
0.205 | 0.5612 | 480 | 1.1209 | 26016008 |
0.2934 | 0.5671 | 485 | 1.1240 | 26285408 |
0.2115 | 0.5729 | 490 | 1.1230 | 26560304 |
0.2721 | 0.5788 | 495 | 1.1183 | 26832040 |
0.2231 | 0.5846 | 500 | 1.1220 | 27102784 |
0.2267 | 0.5905 | 505 | 1.1219 | 27370048 |
0.2362 | 0.5963 | 510 | 1.1191 | 27637016 |
0.3547 | 0.6022 | 515 | 1.1175 | 27911496 |
0.1976 | 0.6080 | 520 | 1.1157 | 28181560 |
0.156 | 0.6139 | 525 | 1.1197 | 28458928 |
0.2763 | 0.6197 | 530 | 1.1185 | 28726752 |
0.1887 | 0.6255 | 535 | 1.1181 | 28999904 |
0.2736 | 0.6314 | 540 | 1.1158 | 29268872 |
0.131 | 0.6372 | 545 | 1.1169 | 29529400 |
0.159 | 0.6431 | 550 | 1.1185 | 29800272 |
0.2407 | 0.6489 | 555 | 1.1187 | 30075312 |
0.1781 | 0.6548 | 560 | 1.1160 | 30343504 |
0.2069 | 0.6606 | 565 | 1.1170 | 30618448 |
0.1864 | 0.6665 | 570 | 1.1165 | 30885152 |
0.1847 | 0.6723 | 575 | 1.1178 | 31164952 |
0.231 | 0.6782 | 580 | 1.1152 | 31434128 |
0.1991 | 0.6840 | 585 | 1.1153 | 31705664 |
0.167 | 0.6899 | 590 | 1.1146 | 31979248 |
0.227 | 0.6957 | 595 | 1.1144 | 32252832 |
0.2543 | 0.7015 | 600 | 1.1143 | 32529232 |
0.192 | 0.7074 | 605 | 1.1117 | 32798912 |
0.1685 | 0.7132 | 610 | 1.1135 | 33067472 |
0.2737 | 0.7191 | 615 | 1.1151 | 33341160 |
0.2623 | 0.7249 | 620 | 1.1113 | 33614256 |
0.1831 | 0.7308 | 625 | 1.1108 | 33885552 |
0.1882 | 0.7366 | 630 | 1.1128 | 34159752 |
0.1994 | 0.7425 | 635 | 1.1129 | 34424184 |
0.2019 | 0.7483 | 640 | 1.1110 | 34696672 |
0.1874 | 0.7542 | 645 | 1.1112 | 34961608 |
0.2148 | 0.7600 | 650 | 1.1127 | 35235968 |
0.2865 | 0.7659 | 655 | 1.1121 | 35511320 |
0.1912 | 0.7717 | 660 | 1.1091 | 35781400 |
0.1976 | 0.7776 | 665 | 1.1114 | 36049904 |
0.2031 | 0.7834 | 670 | 1.1083 | 36316784 |
0.2237 | 0.7892 | 675 | 1.1070 | 36581680 |
0.1906 | 0.7951 | 680 | 1.1114 | 36847632 |
0.1871 | 0.8009 | 685 | 1.1091 | 37121008 |
0.1754 | 0.8068 | 690 | 1.1103 | 37393168 |
0.2057 | 0.8126 | 695 | 1.1122 | 37672008 |
0.2754 | 0.8185 | 700 | 1.1083 | 37940560 |
0.2014 | 0.8243 | 705 | 1.1083 | 38214808 |
0.1009 | 0.8302 | 710 | 1.1094 | 38491368 |
0.1884 | 0.8360 | 715 | 1.1114 | 38764864 |
0.2294 | 0.8419 | 720 | 1.1064 | 39038712 |
0.1975 | 0.8477 | 725 | 1.1077 | 39313504 |
0.1924 | 0.8536 | 730 | 1.1096 | 39573424 |
0.1647 | 0.8594 | 735 | 1.1070 | 39844696 |
0.1648 | 0.8652 | 740 | 1.1088 | 40124016 |
0.2471 | 0.8711 | 745 | 1.1106 | 40394280 |
0.2242 | 0.8769 | 750 | 1.1072 | 40659288 |
0.2206 | 0.8828 | 755 | 1.1062 | 40930232 |
0.1686 | 0.8886 | 760 | 1.1089 | 41196128 |
0.1999 | 0.8945 | 765 | 1.1091 | 41471720 |
0.1762 | 0.9003 | 770 | 1.1060 | 41745264 |
0.2029 | 0.9062 | 775 | 1.1051 | 42019264 |
0.1562 | 0.9120 | 780 | 1.1067 | 42289448 |
0.1733 | 0.9179 | 785 | 1.1078 | 42556216 |
0.2426 | 0.9237 | 790 | 1.1057 | 42821528 |
0.1195 | 0.9296 | 795 | 1.1048 | 43087960 |
0.1793 | 0.9354 | 800 | 1.1072 | 43357296 |
0.2249 | 0.9412 | 805 | 1.1071 | 43626176 |
0.1865 | 0.9471 | 810 | 1.1055 | 43897744 |
0.1759 | 0.9529 | 815 | 1.1064 | 44168888 |
0.2224 | 0.9588 | 820 | 1.1055 | 44438152 |
0.2332 | 0.9646 | 825 | 1.1065 | 44710608 |
0.1773 | 0.9705 | 830 | 1.1069 | 44987784 |
0.2299 | 0.9763 | 835 | 1.1060 | 45255856 |
0.2452 | 0.9822 | 840 | 1.1063 | 45526440 |
0.2184 | 0.9880 | 845 | 1.1057 | 45792408 |
0.2081 | 0.9939 | 850 | 1.1042 | 46059512 |
0.1787 | 0.9997 | 855 | 1.1042 | 46326936 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd2
Base model
google/gemma-2-2b