Edit model card

8B FP 16 weights

Prompt format is the same as Llama 3: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ Standard context length of 8192

This is a testing model trained on a custom 100MB dataset for 4 epochs geared for storytelling with a rolling context window, but might be good at other things too. There's significant evidence that the model is undertrained, longer training runs are baking now.

The dataset was constructed from cleaned long form dialogue, restructured, and then summarized with Llama-70B, and temporally stacked so that the summary of the past dialogue begins the next dialogue. Almost all samples were between 7500-8192 tokens long.

Downloads last month
803
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Blackroot/Llama-3-Gamma-Twist

Quantizations
1 model

Spaces using Blackroot/Llama-3-Gamma-Twist 5