Did you use the full database for training?
#2
by
Hoioi
- opened
Did you use the full database of dolphin and RedPajama to train this model? (In this case it should be more than 8 million rows).
shahules786/orca-chat combines similar examples of the GPT-4 subset of ehartford/dolphin (i.e. only the GPT-4 entries are used).
25% of RedPajama was used .. use find the numbers also in the readme:
Dataset Composition:
Tain (sampled):
orca-chat: 188842
fanfics: 47760
red_pajama: 188262
Valid:
orca-chat: 5000
fanfics: 1000
red_pajama: 1000
changed number formatting
Hoioi
changed discussion status to
closed