How can we enable English mode?

#1
by yeonseok-zeticai - opened

Nice work and I could regenerate the result in Chineses. Is there any way to use the model in English mode?

Nice work and I could regenerate the result in Chineses. Is there any way to use the model in English mode?

MiniMind's training is 90% based on Chinese text, so its English capability is as poor as the early llama model's support for Chinese, haha.

However, if you have 2*3090 or higher graphics cards, you can fully train an English version of Minimind from scratch in just a few hours, simply by changing the training dataset. The only step that needs to be modified is "Download the dataset download link and place it in the ./dataset directory." In fact, English corpora are more abundant than Chinese ones, allowing you to freely use more powerful English data as training materials, whether in json or jsonl format, and place them in the dataset directory.

You can check the quick start guide in my GitHub instructions, which outlines the entire training process (you only need to find an English dataset for the download step):

https://github.com/jingyaogong/minimind/blob/master/README_en.md

πŸ“Œ Quick Start

    1. Clone the project code
git clone https://github.com/jingyaogong/minimind.git
    1. If you need to train the model yourself
    • 2.1 Download the dataset download link and place it in the ./dataset directory.

    • 2.2 Run python data_process.py to process the dataset, such as token-encoding pretrain data and extracting QA
      data to CSV files for the SFT dataset.

    • 2.3 Adjust the model parameter configuration in ./model/LMConfig.py.

    • 2.4 Execute pretraining with python 1-pretrain.py.

    • 2.5 Perform instruction fine-tuning with python 3-full_sft.py.

    • 2.6 Perform LoRA fine-tuning (optional) with python 4-lora_sft.py.

    • 2.7 Execute DPO human preference reinforcement learning alignment (optional) with python 5-dpo_train.py.

    1. Test model inference performance
    • Download the weights from the trained model weights section below and place them in
      the ./out/ directory

      out
      β”œβ”€β”€ multi_chat
      β”‚   β”œβ”€β”€ full_sft_1024.pth
      β”‚   β”œβ”€β”€ full_sft_512.pth
      β”‚   β”œβ”€β”€ full_sft_640_moe.pth
      β”‚   └── full_sft_640.pth
      β”œβ”€β”€ single_chat
      β”‚   β”œβ”€β”€ full_sft_1024.pth
      β”‚   β”œβ”€β”€ full_sft_512.pth
      β”‚   β”œβ”€β”€ full_sft_640_moe.pth
      β”‚   └── full_sft_640.pth
      β”œβ”€β”€ full_sft_1024.pth
      β”œβ”€β”€ full_sft_512.pth
      β”œβ”€β”€ full_sft_640_moe.pth
      β”œβ”€β”€ full_sft_640.pth
      β”œβ”€β”€ pretrain_1024.pth
      β”œβ”€β”€ pretrain_640_moe.pth
      β”œβ”€β”€ pretrain_640.pth
      
    • Test the pretraining model's chain effect with python 0-eval_pretrain.py

    • Test the model's conversational effect with python 2-eval.py

🍭 Tip: Pretraining and full parameter fine-tuning (pretrain and full_sft) support DDP multi-GPU acceleration.

  • Start training on a single machine with N GPUs

    torchrun --nproc_per_node N 1-pretrain.py
    
    torchrun --nproc_per_node N 3-full_sft.py
    

Sign up or log in to comment