QuantFactory/Qwen1.5-MoE-A2.7B-Wikihow-GGUF
This is quantized version of MaziyarPanahi/Qwen1.5-MoE-A2.7B-Wikihow created using llama.cpp
Original Model Card
models/Qwen1.5-MoE-A2.7B-Wikihow
This model is a fine-tuned version of Qwen/Qwen1.5-MoE-A2.7B on the HuggingFaceTB/cosmopedia dataset.
How to use it
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="MaziyarPanahi/Qwen1.5-MoE-A2.7B-Wikihow")
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Qwen1.5-MoE-A2.7B-Wikihow")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Qwen1.5-MoE-A2.7B-Wikihow")
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 1
Training results
See axolotl config
axolotl version: 0.4.0
base_model: Qwen/Qwen1.5-MoE-A2.7B
trust_remote_code: true
load_in_8bit: false
load_in_4bit: true
strict: false
# hub_model_id: MaziyarPanahi/Qwen1.5-MoE-A2.7B-Wikihow
# hf_use_auth_token: true
chat_template: chatml
datasets:
- path: HuggingFaceTB/cosmopedia
name: wikihow
type:
system_prompt: ""
field_instruction: prompt
field_output: text
format: "<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n"
no_input_format: "<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n"
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./models/Qwen1.5-MoE-A2.7B-Wikihow
sequence_len: 2048
sample_packing: false
pad_to_sequence_len: false
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
Framework versions
- PEFT 0.10.0
- Transformers 4.40.0.dev0
- Pytorch 2.2.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 11.43 |
IFEval (0-Shot) | 29.54 |
BBH (3-Shot) | 15.47 |
MATH Lvl 5 (4-Shot) | 2.87 |
GPQA (0-shot) | 3.36 |
MuSR (0-shot) | 2.01 |
MMLU-PRO (5-shot) | 15.34 |
- Downloads last month
- 303
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for QuantFactory/Qwen1.5-MoE-A2.7B-Wikihow-GGUF
Base model
Qwen/Qwen1.5-MoE-A2.7B