Did you use the full database for training?

#2
by Hoioi - opened

Did you use the full database of dolphin and RedPajama to train this model? (In this case it should be more than 8 million rows).

shahules786/orca-chat combines similar examples of the GPT-4 subset of ehartford/dolphin (i.e. only the GPT-4 entries are used).
25% of RedPajama was used .. use find the numbers also in the readme:

Dataset Composition:
    Tain (sampled):
       orca-chat: 188842
       fanfics: 47760
       red_pajama: 188262
    Valid:
       orca-chat: 5000
       fanfics: 1000
       red_pajama: 1000
OpenAssistant org

changed number formatting

Hoioi changed discussion status to closed

Sign up or log in to comment