Edit model card

collapse_gemma-2-2b_hs2_replace_iter16_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5763
  • Num Input Tokens Seen: 4698448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.504 0.0513 5 1.2854 232992
0.867 0.1026 10 1.3253 476672
0.496 0.1539 15 1.5555 716224
0.2393 0.2053 20 1.7741 964048
0.1039 0.2566 25 2.0060 1204056
0.0832 0.3079 30 2.2124 1441288
0.027 0.3592 35 2.3690 1695136
0.0282 0.4105 40 2.4619 1928016
0.0262 0.4618 45 2.5506 2170400
0.0241 0.5131 50 2.5949 2412520
0.0252 0.5645 55 2.5983 2649016
0.0228 0.6158 60 2.5883 2897584
0.0233 0.6671 65 2.5566 3146984
0.0236 0.7184 70 2.5517 3391736
0.0227 0.7697 75 2.5737 3640696
0.0218 0.8210 80 2.5875 3885024
0.0245 0.8724 85 2.5931 4126440
0.0231 0.9237 90 2.5836 4365400
0.0237 0.9750 95 2.5813 4596256

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter16_sftsd0

Base model

google/gemma-2-2b
Finetuned
(429)
this model