juewang commited on
Commit
24610f4
1 Parent(s): 30e8cff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -39,10 +39,9 @@ The model follows the architecture of Llama-2-7B and extends it to handle a long
39
  The model has been trained using a mixture of pre-training and instruction tuning data.
40
  - In the first training phase of continued pre-training, our data mixture contains 25% RedPajama Book, 25% RedPajama ArXiv (including abstracts), 25% other data from RedPajama, and 25% from the UL2 Oscar Data, which is a part of OIG (Open-Instruction-Generalist), asking the model to fill in missing chunks, or complete the text.
41
  To enhance the long-context ability, we exclude data shorter than 2K word. The inclusion of UL2 Oscar Data is effective in compelling the model to read and utilize long-range context.
42
- - We then fine-tune the model to focus on its few shot capacity under long context, including 20% Natural Instructions (NI), 20% Public Pool of Prompts (P3), 20% the Pile. We decontaminated all data against HELM core scenarios (see a precise protocol here). We teach the model to leverage the in-context examples by packing examples into one 32K-token sequence. To maintain the knowledge learned from the first piece of data, we incorporate 20% RedPajama-Data Book and 20% RedPajama-Data ArXiv with abstracts.
43
 
44
-
45
- We provide examples of how to fine-tune the model for specific applications.
46
  You can use the [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) to fine-tune your own 32K model over Llama-2-7B-32K-beta.
47
  Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) for step-by-step illustrations.
48
 
 
39
  The model has been trained using a mixture of pre-training and instruction tuning data.
40
  - In the first training phase of continued pre-training, our data mixture contains 25% RedPajama Book, 25% RedPajama ArXiv (including abstracts), 25% other data from RedPajama, and 25% from the UL2 Oscar Data, which is a part of OIG (Open-Instruction-Generalist), asking the model to fill in missing chunks, or complete the text.
41
  To enhance the long-context ability, we exclude data shorter than 2K word. The inclusion of UL2 Oscar Data is effective in compelling the model to read and utilize long-range context.
42
+ - We then fine-tune the model to focus on its few shot capacity under long context, including 20% Natural Instructions (NI), 20% Public Pool of Prompts (P3), 20% the Pile. We decontaminated all data against HELM core scenarios . We teach the model to leverage the in-context examples by packing examples into one 32K-token sequence. To maintain the knowledge learned from the first piece of data, we incorporate 20% RedPajama-Data Book and 20% RedPajama-Data ArXiv.
43
 
44
+ Next, we provide examples of how to fine-tune the model for specific applications.
 
45
  You can use the [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) to fine-tune your own 32K model over Llama-2-7B-32K-beta.
46
  Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) for step-by-step illustrations.
47