How can we enable English mode?
Nice work and I could regenerate the result in Chineses. Is there any way to use the model in English mode?
Nice work and I could regenerate the result in Chineses. Is there any way to use the model in English mode?
MiniMind's training is 90% based on Chinese text, so its English capability is as poor as the early llama model's support for Chinese, haha.
However, if you have 2*3090 or higher graphics cards, you can fully train an English version of Minimind from scratch in just a few hours, simply by changing the training dataset. The only step that needs to be modified is "Download the dataset download link and place it in the ./dataset directory." In fact, English corpora are more abundant than Chinese ones, allowing you to freely use more powerful English data as training materials, whether in json or jsonl format, and place them in the dataset directory.
You can check the quick start guide in my GitHub instructions, which outlines the entire training process (you only need to find an English dataset for the download step):
https://github.com/jingyaogong/minimind/blob/master/README_en.md
π Quick Start
- Clone the project code
git clone https://github.com/jingyaogong/minimind.git
- If you need to train the model yourself
2.1 Download the dataset download link and place it in the
./dataset
directory.2.2 Run
python data_process.py
to process the dataset, such as token-encoding pretrain data and extracting QA
data to CSV files for the SFT dataset.2.3 Adjust the model parameter configuration in
./model/LMConfig.py
.2.4 Execute pretraining with
python 1-pretrain.py
.2.5 Perform instruction fine-tuning with
python 3-full_sft.py
.2.6 Perform LoRA fine-tuning (optional) with
python 4-lora_sft.py
.2.7 Execute DPO human preference reinforcement learning alignment (optional) with
python 5-dpo_train.py
.
- Test model inference performance
Download the weights from the trained model weights section below and place them in
the./out/
directoryout βββ multi_chat β βββ full_sft_1024.pth β βββ full_sft_512.pth β βββ full_sft_640_moe.pth β βββ full_sft_640.pth βββ single_chat β βββ full_sft_1024.pth β βββ full_sft_512.pth β βββ full_sft_640_moe.pth β βββ full_sft_640.pth βββ full_sft_1024.pth βββ full_sft_512.pth βββ full_sft_640_moe.pth βββ full_sft_640.pth βββ pretrain_1024.pth βββ pretrain_640_moe.pth βββ pretrain_640.pth
Test the pretraining model's chain effect with
python 0-eval_pretrain.py
Test the model's conversational effect with
python 2-eval.py
π Tip: Pretraining and full parameter fine-tuning (pretrain
and full_sft
) support DDP multi-GPU acceleration.
Start training on a single machine with N GPUs
torchrun --nproc_per_node N 1-pretrain.py
torchrun --nproc_per_node N 3-full_sft.py