About finetuning
Could you make your fine-tuning code publicly available?
Hi
@Xiangyu1
Since this model is compatible with HF ecosystem, you could check out https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py as a starting point to finetune the model
I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html
I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html
Can I train using just the LongLoRA code, or have you made any modifications to this code?
If you want to train from scratch, you may need to initialize the model weight without using a pre-trained model.
We do modifications to huggingface to make it support 2x long context length with mini-sequence technology
Note the training had some issues which should be fixed by: https://github.com/huggingface/transformers/pull/33195 the kernels did not considered layer norms on B, DT and C states
Now the fix is merged on transformers main branch, make sure to re-install transformers main branch before the next release
Hello everyone,
I need assistance with fine-tuning the Falcon Mamba model. Could you provide guidance or share any examples of how to perform the fine-tuning process?