mega-ar-350m-v0.13
Model description
Continued-training of BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw on a few more datasets.
It achieves the following results on the evaluation set (BEE-spoke-data/UltraTextbooks-2.1-fw_mix
):
- Loss: 1.9926
- Accuracy: 0.5885
- Num Input Tokens Seen: 3468165120
Quick eval
Quick eval for: pszemraj/mega-ar-350m-v0.13
hf (pretrained=pszemraj/mega-ar-350m-v0.13,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_easy | 1 | none | 0 | acc | 0.4491 | ± | 0.0102 |
none | 0 | acc_norm | 0.4061 | ± | 0.0101 | ||
boolq | 2 | none | 0 | acc | 0.5367 | ± | 0.0087 |
lambada_openai | 1 | none | 0 | perplexity | 55.3308 | ± | 2.3100 |
none | 0 | acc | 0.3113 | ± | 0.0065 | ||
openbookqa | 1 | none | 0 | acc | 0.1760 | ± | 0.0170 |
none | 0 | acc_norm | 0.2680 | ± | 0.0198 | ||
piqa | 1 | none | 0 | acc | 0.6366 | ± | 0.0112 |
none | 0 | acc_norm | 0.6213 | ± | 0.0113 | ||
winogrande | 1 | none | 0 | acc | 0.5036 | ± | 0.0141 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 80085
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 32
- total_train_batch_size: 96
- total_eval_batch_size: 3
- optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1.0
- Downloads last month
- 2,752
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.