Edit model card

Llama 3.1 Instruct, continually pretrained with a full epoch (1169 steps @ total batch 115) of the same 1.5gb private dataset that underpins Iambe

Instruction is broken, needs to be reSFT'd


Why do this? I have a niche use case where I cannot increase compute over 8b, and L3/3.1 are the only models in this size category that meet my needs for logic. However, both versions of L3/3.1 have the damn repetition/token overconfidence problem, and this is meant to disrupt that certainty without disrupting the model's ability to function.

By the way, I think it's the lm_head that is causing the looping, but it might be the embeddings being too separated. I'm not going to pay two more times to test them separately, however :p

Downloads last month
15
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1

Finetunes
1 model