Built with Axolotl

3ee349ea-a42b-4ba1-9eca-0cb04a55a667

This model is a fine-tuned version of katuni4ka/tiny-random-qwen1.5-moe on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 11.7920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.000211
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 110
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss
No log 0.0002 1 11.9334
11.8775 0.0082 50 11.8840
11.8536 0.0163 100 11.8639
11.8233 0.0245 150 11.8314
11.8236 0.0327 200 11.8154
11.8038 0.0408 250 11.8065
11.7963 0.0490 300 11.7977
11.8009 0.0572 350 11.7945
11.7954 0.0653 400 11.7928
11.7993 0.0735 450 11.7921
11.7991 0.0817 500 11.7920

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
7
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for lesso11/3ee349ea-a42b-4ba1-9eca-0cb04a55a667

Adapter
(267)
this model