Finetuning upstage/SOLAR-10.7B-Instruct-v1.0
I have 2 A10 GPUs (total memory 48GB) and I loaded quantised model (size was almost 9GB) and tried finetuning but got "out of memory" error . I loaded the model in the following way :
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16 #Changed from bflot16
)
config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
)
model_name = "./SOLAR-10.7B-Instruct-v1.0"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", quantization_config=quant_config, trust_remote_code=True)
# model.gradient_checkpointing_enable() ## Added checkpointing
model = prepare_model_for_kbit_training(model,use_gradient_checkpointing=False)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = get_peft_model(model, config)
To overcome this, I tried adding gradient_checkpointing=True
in TrainingAguments
:
def train_model(dsl_train,dsl_test,model,tokenizer,output_dir):
os.environ["WANDB_DISABLED"] = "true"
model.config.use_cache = False
trainer = transformers.Trainer(
model=model,
train_dataset=dsl_train,
eval_dataset=dsl_test,
args=transformers.TrainingArguments(
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
evaluation_strategy='epoch',
save_strategy='epoch',
load_best_model_at_end=True,
log_level='info',
overwrite_output_dir=True,
report_to=None,
warmup_steps=1,
num_train_epochs=3,
learning_rate=2e-4,
fp16=True,
logging_steps=1,
save_steps=1,
output_dir=output_dir,
# optim='paged_lion_8bit', #"paged_adamw_8bit"
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
result = trainer.train()
return result,model,tokenizer
I got the following error:
ERROR - Exception
Traceback (most recent call last):
File "/home/datascience/conda/pytorch20_p39_gpu_v2/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_11584/2258950767.py", line 1, in <cell line: 1>
result,model,tokenizer = train_model(dsl_train,dsl_test,model,tokenizer,output_dir)
File "/tmp/ipykernel_11584/1227025339.py", line 30, in train_model
result = trainer.train()
File "/home/datascience/conda/pytorch20_p39_gpu_v2/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/home/datascience/conda/pytorch20_p39_gpu_v2/lib/python3.9/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/datascience/conda/pytorch20_p39_gpu_v2/lib/python3.9/site-packages/transformers/trainer.py", line 2734, in training_step
self.accelerator.backward(loss)
File "/home/datascience/conda/pytorch20_p39_gpu_v2/lib/python3.9/site-packages/accelerate/accelerator.py", line 1851, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/home/datascience/conda/pytorch20_p39_gpu_v2/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/datascience/conda/pytorch20_p39_gpu_v2/lib/python3.9/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I am not aware of what causing this. I tried the changes provide in https://github.com/huggingface/transformers/issues/25006
but this does not work as SOLAR requires updates versions of transformers, torch and accelerate. Please help me in finding the cause to debug this issue.
Hello,
I successfully fine-tuned this model for another task I have been working on recently. I do not think the problem you encounter is due to your GPU because I did with a single GPU with 24GB. The problem you face is possibly because of library configuration issue. Here is my packages, make sure to use a virtualenv and load these:
%pip install -Uqqq pip --progress-bar off
%pip install -qqq torch==2.0.1 --progress-bar off
#!pip install -qqq transformers==4.32.1 --progress-bar off
%pip install git+https://github.com/huggingface/transformers
%pip install -qqq datasets==2.14.4 --progress-bar off
%pip install -qqq peft==0.5.0 --progress-bar off
%pip install -qqq bitsandbytes==0.41.1 --progress-bar off
%pip install -qqq trl==0.7.1 --progress-bar off
%pip install scipy
%pip install accelerate==0.27.2
Hope this helps!
@halilergul1
, I have figured out the issue. I am using use_gradient_checkpointing=False
in model = prepare_model_for_kbit_training(model,use_gradient_checkpointing=False)
but gradient_checkpointing=True
is set to True
in TrainingArguments
. When I removed use_gradient_checkpointing=False
, then it worked.