MicroThinker-8B-Preview

MicroThinker-8B-Preview, a new model fine-tuned from the huihui-ai/Meta-Llama-3.1-8B-Instruct-abliterated model, focused on advancing AI reasoning capabilities.

The 8B version is better than both the 3B and 1B versions.

Use with ollama

You can use huihui_ai/microthinker directly

ollama run huihui_ai/microthinker:8b

Training Details

This is just a test, but the performance is quite good.

Now, I'll introduce the test environment.

The model was trained using 1 RTX 4090 GPU(24GB) .

The fine-tuning process used 142k from the FineQwQ-142k dataset, max_length(tokens) 21710, quant_bits 4.

The SFT (Supervised Fine-Tuning) process is divided into several steps, and no code needs to be written.

Create the environment.

conda create -yn ms-swift python=3.11
conda activate ms-swift

git clone https://github.com/modelscope/ms-swift.git

cd ms-swift
pip install -e .
cd ..

Download the model and dataset.

huggingface-cli download huihui-ai/Llama-3.1-8B-Instruct-abliterated --local-dir ./huihui-ai/Llama-3.1-8B-Instruct-abliterated
huggingface-cli download --repo-type  dataset huihui-ai/FineQwQ-142k --local-dir ./data/FineQwQ-142k

Used only the huihui-ai/FineQwQ-142k, Trained for 1 epoch:

swift sft --model huihui-ai/Llama-3.1-8B-Instruct-abliterated --model_type llama3_1 --train_type lora --dataset "data/FineQwQ-142k/FineQwQ-142k.jsonl" --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --max_length 21710 --quant_bits 4 --bnb_4bit_compute_dtype bfloat16 --bnb_4bit_quant_storage bfloat16 --lora_rank 8 --lora_alpha 32 --gradient_checkpointing true --weight_decay 0.1 --learning_rate 1e-4 --gradient_accumulation_steps 16 --eval_steps 500 --save_steps 500 --logging_steps 100 --system "You are a helpful assistant. You should think step-by-step." --output_dir output/MicroThinker-8B-Preview/lora/sft --model_author "huihui-ai" --model_name "MicroThinker-8B-Preview"

Save the final fine-tuned model. After you're done, input exit to exit. Replace the directories below with specific ones.

swift infer --model huihui-ai/Llama-3.1-8B-Instruct-abliterated --adapters output/Llama-3.1-8B-Instruct-abliterated/lora/sft/v0-20250119-175713/checkpoint-19500 --stream true --merge_lora true

This should create a new model directory: checkpoint-19500-merged, Rename the directory to MicroThinker-8B-Preview, Copy or move this directory to the huihui directory.

Perform inference on the final fine-tuned model.

swift infer --model huihui/MicroThinker-8B-Preview --stream true --infer_backend pt --max_new_tokens 8192

Test examples.

How many 'r' characters are there in the word "strawberry"?

If a lake is covered by lilies in 48 days, with the number of lilies doubling each day, how many days does it take to cover half the lake?

If there are 10 people at a meeting who shake hands with each other, how many handshakes will occur in total?

huihui-ai
/

MicroThinker-8B-Preview

MicroThinker-8B-Preview

Use with ollama

Training Details

Model tree for huihui-ai/MicroThinker-8B-Preview

Dataset used to train huihui-ai/MicroThinker-8B-Preview

Collection including huihui-ai/MicroThinker-8B-Preview

MicroThinker