No success so far, the training data contains some larger contexts and it fails just before complete the first epoch.
(dataset: DevQuasar/brainstorm-v3.1_vicnua_1k)
If anyone has further suggestion to the bnb config (with ROCm on MI100)?
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
Now testing with my other dataset that is smaller seems I have a lower memory need
DevQuasar/brainstorm_vicuna_1k