ValueError in Forward pass
Hi,
I'm getting the error below when I try to fine-tune the model. Any ideas?
Training Epoch 0: 0%| | 0/4 [00:00<?, ?it/s]
FRAMES: (16, 224, 224, 3)
FRAMES: torch.Size([16, 3, 224, 224])
Batch 0:
Prompts: 1
Videos shape: torch.Size([1, 16, 3, 224, 224])
Input IDs shape: torch.Size([1, 350])
Attention Mask shape: torch.Size([1, 350])
Training Epoch 0: 0%| | 0/4 [00:03<?, ?it/s]
ValueError Traceback (most recent call last)
in <cell line: 1>()
1 for i in range(4):
----> 2 train_epoch(p_model, train_loader, optimizer, processor, device, i)
9 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/llava_next_video/modeling_llava_next_video.py in forward(self, input_ids, pixel_values, pixel_values_videos, image_sizes, attention_mask, position_ids, past_key_values, inputs_embeds, vision_feature_layer, vision_feature_select_strategy, labels, use_cache, output_attentions, output_hidden_states, return_dict, cache_position, num_logits_to_keep)
1038 n_video_features = video_features.shape[0]
1039 if n_video_tokens != n_video_features:
-> 1040 raise ValueError(
1041 f"Video features and video tokens do not match: tokens: {n_video_tokens}, features {n_video_features}"
1042 )
ValueError: Video features and video tokens do not match: tokens: 345, features 2304
@Manisha2203 which version you are using? Can you try to update to the latest version, v4.47? In case it doesn't work, can you share a small reproducer pls
@Manisha2203 can I get a smaller reproducer so I dont have to dowload the whole dataset? If you can pinpoint to the exact video that fails and provide a small 40-liner code, it would be much more helpful. Right now it is hard to say what is causing the error, so you can try to remove stuff like PEFT/data parallel etc. to see if they are the source of error. Then narrow it down to the exact batch and image where it occurs, and see if it is only in train mode or also when running inference
On the last release v4.47 it works fine if I run the demo code from model page here, so I am guessing it might have something to do with the training methods you are using
HI @RaushanTurganbay , here is a small reproducer with 4 data samples that were giving me the error: https://drive.google.com/drive/folders/17F_D3weqI7cbUV18YOptjJXoidnznFrH?usp=sharing.
I tried without the PEFT/data parallel stuff as well, no luck. I am getting the error at both training and inference.
The same code is running fine with the Video LLava model: https://github.com/huggingface/transformers/blob/main/src/transformers/models/video_llava/modeling_video_llava.py
@Manisha2203 sorry the drive link is 404, can you make a colab notebook with code that downloads necessary images and and runs inference?