8B FP 16 weights
Prompt format is the same as Llama 3: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ Standard context length of 8192
This is a testing model trained on a custom 100MB dataset for 4 epochs geared for storytelling with a rolling context window, but might be good at other things too. There's significant evidence that the model is undertrained, longer training runs are baking now.
The dataset was constructed from cleaned long form dialogue, restructured, and then summarized with Llama-70B, and temporally stacked so that the summary of the past dialogue begins the next dialogue. Almost all samples were between 7500-8192 tokens long.
- Downloads last month
- 803
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.