About finetuning

by Xiangyu1 - opened Aug 14, 2024

Discussion

Xiangyu1

Aug 14, 2024

Could you make your fine-tuning code publicly available?

ybelkada

Technology Innovation Institute org Aug 14, 2024

Hi @Xiangyu1
Since this model is compatible with HF ecosystem, you could check out https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py as a starting point to finetune the model

wdlctc

Sep 3, 2024

I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html

Xiangyu1

Sep 4, 2024

I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html

Can I train using just the LongLoRA code, or have you made any modifications to this code?

wdlctc

Sep 4, 2024

If you want to train from scratch, you may need to initialize the model weight without using a pre-trained model.

We do modifications to huggingface to make it support 2x long context length with mini-sequence technology

ybelkada

Technology Innovation Institute org Sep 5, 2024

Note the training had some issues which should be fixed by: https://github.com/huggingface/transformers/pull/33195 the kernels did not considered layer norms on B, DT and C states

ybelkada

Technology Innovation Institute org Sep 5, 2024

Now the fix is merged on transformers main branch, make sure to re-install transformers main branch before the next release

PJoshi29

7 days ago

Hello everyone,

I need assistance with fine-tuning the Falcon Mamba model. Could you provide guidance or share any examples of how to perform the fine-tuning process?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment