collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1103
- Num Input Tokens Seen: 30159864
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6921 | 0.0091 | 5 | 1.3865 | 277592 |
1.5157 | 0.0183 | 10 | 1.3199 | 553240 |
1.4327 | 0.0274 | 15 | 1.2512 | 835040 |
1.3634 | 0.0366 | 20 | 1.1942 | 1119152 |
1.2062 | 0.0457 | 25 | 1.1630 | 1386248 |
1.1502 | 0.0548 | 30 | 1.1451 | 1658184 |
1.1499 | 0.0640 | 35 | 1.1355 | 1932840 |
1.0385 | 0.0731 | 40 | 1.1430 | 2203400 |
1.0015 | 0.0822 | 45 | 1.1660 | 2478336 |
0.898 | 0.0914 | 50 | 1.1775 | 2749216 |
0.8754 | 0.1005 | 55 | 1.1909 | 3024568 |
0.7831 | 0.1097 | 60 | 1.2013 | 3297256 |
0.7973 | 0.1188 | 65 | 1.2082 | 3567512 |
0.6224 | 0.1279 | 70 | 1.1975 | 3832728 |
0.7229 | 0.1371 | 75 | 1.2022 | 4107456 |
0.6716 | 0.1462 | 80 | 1.2067 | 4381328 |
0.6282 | 0.1554 | 85 | 1.1985 | 4664272 |
0.6613 | 0.1645 | 90 | 1.1931 | 4946808 |
0.5538 | 0.1736 | 95 | 1.1930 | 5225232 |
0.5592 | 0.1828 | 100 | 1.1906 | 5499184 |
0.4737 | 0.1919 | 105 | 1.1943 | 5773464 |
0.4775 | 0.2011 | 110 | 1.1922 | 6045360 |
0.5431 | 0.2102 | 115 | 1.1878 | 6319560 |
0.4571 | 0.2193 | 120 | 1.1972 | 6595248 |
0.4625 | 0.2285 | 125 | 1.1849 | 6867392 |
0.4473 | 0.2376 | 130 | 1.1891 | 7145000 |
0.5032 | 0.2467 | 135 | 1.1884 | 7422304 |
0.527 | 0.2559 | 140 | 1.1812 | 7692168 |
0.4619 | 0.2650 | 145 | 1.1891 | 7971504 |
0.3861 | 0.2742 | 150 | 1.1777 | 8252232 |
0.368 | 0.2833 | 155 | 1.1825 | 8524736 |
0.3585 | 0.2924 | 160 | 1.1737 | 8803376 |
0.3527 | 0.3016 | 165 | 1.1859 | 9079664 |
0.3797 | 0.3107 | 170 | 1.1770 | 9350760 |
0.3966 | 0.3199 | 175 | 1.1802 | 9632672 |
0.4109 | 0.3290 | 180 | 1.1730 | 9909824 |
0.3386 | 0.3381 | 185 | 1.1750 | 10173440 |
0.36 | 0.3473 | 190 | 1.1711 | 10449856 |
0.4232 | 0.3564 | 195 | 1.1766 | 10723480 |
0.3718 | 0.3655 | 200 | 1.1686 | 10996072 |
0.3378 | 0.3747 | 205 | 1.1685 | 11274712 |
0.3298 | 0.3838 | 210 | 1.1680 | 11548536 |
0.2605 | 0.3930 | 215 | 1.1632 | 11819712 |
0.3222 | 0.4021 | 220 | 1.1657 | 12095032 |
0.3331 | 0.4112 | 225 | 1.1652 | 12378464 |
0.2945 | 0.4204 | 230 | 1.1584 | 12652256 |
0.2602 | 0.4295 | 235 | 1.1626 | 12933344 |
0.3413 | 0.4387 | 240 | 1.1585 | 13206880 |
0.3522 | 0.4478 | 245 | 1.1545 | 13481312 |
0.3239 | 0.4569 | 250 | 1.1541 | 13757280 |
0.33 | 0.4661 | 255 | 1.1550 | 14035648 |
0.3271 | 0.4752 | 260 | 1.1496 | 14314056 |
0.3631 | 0.4844 | 265 | 1.1574 | 14591184 |
0.2662 | 0.4935 | 270 | 1.1473 | 14869784 |
0.3374 | 0.5026 | 275 | 1.1495 | 15145912 |
0.377 | 0.5118 | 280 | 1.1476 | 15422056 |
0.3415 | 0.5209 | 285 | 1.1429 | 15701624 |
0.3588 | 0.5300 | 290 | 1.1448 | 15975448 |
0.2623 | 0.5392 | 295 | 1.1429 | 16251672 |
0.3372 | 0.5483 | 300 | 1.1397 | 16532768 |
0.3099 | 0.5575 | 305 | 1.1411 | 16807688 |
0.3222 | 0.5666 | 310 | 1.1403 | 17084280 |
0.2805 | 0.5757 | 315 | 1.1359 | 17362984 |
0.3158 | 0.5849 | 320 | 1.1391 | 17636368 |
0.3678 | 0.5940 | 325 | 1.1345 | 17909736 |
0.2457 | 0.6032 | 330 | 1.1353 | 18187664 |
0.4106 | 0.6123 | 335 | 1.1346 | 18465160 |
0.4054 | 0.6214 | 340 | 1.1343 | 18735840 |
0.4196 | 0.6306 | 345 | 1.1306 | 19013544 |
0.3024 | 0.6397 | 350 | 1.1335 | 19291160 |
0.2863 | 0.6488 | 355 | 1.1335 | 19566392 |
0.3069 | 0.6580 | 360 | 1.1296 | 19846576 |
0.4561 | 0.6671 | 365 | 1.1286 | 20120792 |
0.3369 | 0.6763 | 370 | 1.1289 | 20397368 |
0.342 | 0.6854 | 375 | 1.1292 | 20674400 |
0.4051 | 0.6945 | 380 | 1.1252 | 20955416 |
0.1938 | 0.7037 | 385 | 1.1282 | 21228600 |
0.2087 | 0.7128 | 390 | 1.1273 | 21509832 |
0.2746 | 0.7220 | 395 | 1.1244 | 21781432 |
0.3352 | 0.7311 | 400 | 1.1271 | 22062768 |
0.2967 | 0.7402 | 405 | 1.1253 | 22336688 |
0.2059 | 0.7494 | 410 | 1.1242 | 22617384 |
0.2417 | 0.7585 | 415 | 1.1241 | 22888744 |
0.283 | 0.7676 | 420 | 1.1219 | 23166464 |
0.3493 | 0.7768 | 425 | 1.1223 | 23442624 |
0.3613 | 0.7859 | 430 | 1.1215 | 23724456 |
0.2175 | 0.7951 | 435 | 1.1199 | 23997552 |
0.3372 | 0.8042 | 440 | 1.1209 | 24271688 |
0.3313 | 0.8133 | 445 | 1.1184 | 24549464 |
0.3209 | 0.8225 | 450 | 1.1187 | 24830048 |
0.2609 | 0.8316 | 455 | 1.1187 | 25105840 |
0.335 | 0.8408 | 460 | 1.1176 | 25383592 |
0.2367 | 0.8499 | 465 | 1.1171 | 25654008 |
0.3219 | 0.8590 | 470 | 1.1170 | 25924368 |
0.29 | 0.8682 | 475 | 1.1189 | 26194176 |
0.231 | 0.8773 | 480 | 1.1164 | 26472920 |
0.2929 | 0.8865 | 485 | 1.1169 | 26748736 |
0.2734 | 0.8956 | 490 | 1.1169 | 27018208 |
0.3264 | 0.9047 | 495 | 1.1150 | 27298736 |
0.2777 | 0.9139 | 500 | 1.1144 | 27564544 |
0.3015 | 0.9230 | 505 | 1.1126 | 27841416 |
0.3482 | 0.9321 | 510 | 1.1137 | 28115128 |
0.3251 | 0.9413 | 515 | 1.1132 | 28395504 |
0.3143 | 0.9504 | 520 | 1.1135 | 28675176 |
0.3316 | 0.9596 | 525 | 1.1146 | 28940144 |
0.3076 | 0.9687 | 530 | 1.1105 | 29217824 |
0.3911 | 0.9778 | 535 | 1.1112 | 29503120 |
0.2661 | 0.9870 | 540 | 1.1114 | 29775240 |
0.3464 | 0.9961 | 545 | 1.1098 | 30047440 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 10
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1
Base model
google/gemma-2-2b