See axolotl config

axolotl version: 0.6.0

base_model: mistralai/Mistral-7B-v0.1
# optionally might have model_type or tokenizer_type
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
# Automatically upload checkpoint and final model to HF
hub_model_id: AiAF/Mistral-QLoRA-Pretraining-Test-v1.1

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: AiAF/pretraining.jsonl
    type: completion

dataset_prepared_path: last_run_prepared
val_set_size: 0.1
output_dir: /workspace/axolotl/outputs/qlora-out/Mistral-QLoRA-Pretraining-Test-V1.1.1

adapter: qlora
lora_model_dir:

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: "LLM_QLoRA-Pretraining-Practice"
wandb_entity:
wandb_watch: "all"
wandb_name: "Mistral-QLoRA-Pretraining-Test-V1.1"
wandb_log_model: "false"

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.000005

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint: /workspace/axolotl/outputs/qlora-out/Mistral-QLoRA-Pretraining-Test-V1.1/checkpoint-40
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

Mistral-QLoRA-Pretraining-Test-v1.1

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the AiAF/pretraining.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 1.8738

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.9972	0.1176	1	1.8782
1.8162	0.2353	2	1.8782
1.8588	0.4706	4	1.8783
2.0207	0.7059	6	1.8782
1.9881	0.9412	8	1.8780
1.9846	1.2353	10	1.8779
1.8436	1.4706	12	1.8778
1.9974	1.7059	14	1.8775
2.0703	1.9412	16	1.8773
1.9806	2.2353	18	1.8770
1.8501	2.4706	20	1.8769
1.9708	2.7059	22	1.8766
2.0717	2.9412	24	1.8763
2.126	3.2353	26	1.8762
1.931	3.4706	28	1.8760
1.8087	3.7059	30	1.8758
1.8101	3.9412	32	1.8758
2.0657	4.2353	34	1.8758
1.965	4.4706	36	1.8757
1.9222	4.7059	38	1.8757
1.9094	4.9412	40	1.8757
1.9283	5.2353	42	1.8756
2.0211	5.4706	44	1.8754
1.909	5.7059	46	1.8751
1.8289	5.9412	48	1.8749
1.9443	6.2353	50	1.8748
2.0195	6.4706	52	1.8747
1.7326	6.7059	54	1.8744
1.8524	6.9412	56	1.8743
1.958	7.2353	58	1.8742
1.9866	7.4706	60	1.8741
2.0558	7.7059	62	1.8741
1.9277	7.9412	64	1.8740
2.0108	8.2353	66	1.8739
1.9575	8.4706	68	1.8740
1.9107	8.7059	70	1.8739
1.9935	8.9412	72	1.8738
2.0618	9.2353	74	1.8738
1.8251	9.4706	76	1.8739
1.9817	9.7059	78	1.8739
1.9202	9.9412	80	1.8738

Framework versions

PEFT 0.14.0
Transformers 4.48.3
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

AiAF
/

Mistral-QLoRA-Pretraining-Test-v1.1

Mistral-QLoRA-Pretraining-Test-v1.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for AiAF/Mistral-QLoRA-Pretraining-Test-v1.1

Dataset used to train AiAF/Mistral-QLoRA-Pretraining-Test-v1.1

Evaluation results