ValueError in Forward pass

#11

by Manisha2203 - opened 20 days ago

Discussion

Manisha2203

20 days ago

Hi,

I'm getting the error below when I try to fine-tune the model. Any ideas?

Training Epoch 0: 0%| | 0/4 [00:00<?, ?it/s]
FRAMES: (16, 224, 224, 3)

FRAMES: torch.Size([16, 3, 224, 224])
Batch 0:
Prompts: 1
Videos shape: torch.Size([1, 16, 3, 224, 224])
Input IDs shape: torch.Size([1, 350])
Attention Mask shape: torch.Size([1, 350])
Training Epoch 0: 0%| | 0/4 [00:03<?, ?it/s]

ValueError Traceback (most recent call last)
in <cell line: 1>()
1 for i in range(4):
----> 2 train_epoch(p_model, train_loader, optimizer, processor, device, i)

9 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/llava_next_video/modeling_llava_next_video.py in forward(self, input_ids, pixel_values, pixel_values_videos, image_sizes, attention_mask, position_ids, past_key_values, inputs_embeds, vision_feature_layer, vision_feature_select_strategy, labels, use_cache, output_attentions, output_hidden_states, return_dict, cache_position, num_logits_to_keep)
1038 n_video_features = video_features.shape[0]
1039 if n_video_tokens != n_video_features:
-> 1040 raise ValueError(
1041 f"Video features and video tokens do not match: tokens: {n_video_tokens}, features {n_video_features}"
1042 )

ValueError: Video features and video tokens do not match: tokens: 345, features 2304

RaushanTurganbay

Llava Hugging Face org 18 days ago

@Manisha2203 which version you are using? Can you try to update to the latest version, v4.47? In case it doesn't work, can you share a small reproducer pls

Manisha2203

18 days ago

This comment has been hidden

RaushanTurganbay

Llava Hugging Face org 17 days ago

@Manisha2203 can I get a smaller reproducer so I dont have to dowload the whole dataset? If you can pinpoint to the exact video that fails and provide a small 40-liner code, it would be much more helpful. Right now it is hard to say what is causing the error, so you can try to remove stuff like PEFT/data parallel etc. to see if they are the source of error. Then narrow it down to the exact batch and image where it occurs, and see if it is only in train mode or also when running inference

On the last release v4.47 it works fine if I run the demo code from model page here, so I am guessing it might have something to do with the training methods you are using

Manisha2203

17 days ago

HI @RaushanTurganbay , here is a small reproducer with 4 data samples that were giving me the error: https://drive.google.com/drive/folders/17F_D3weqI7cbUV18YOptjJXoidnznFrH?usp=sharing.

I tried without the PEFT/data parallel stuff as well, no luck. I am getting the error at both training and inference.

The same code is running fine with the Video LLava model: https://github.com/huggingface/transformers/blob/main/src/transformers/models/video_llava/modeling_video_llava.py

RaushanTurganbay

Llava Hugging Face org 15 days ago

@Manisha2203 sorry the drive link is 404, can you make a colab notebook with code that downloads necessary images and and runs inference?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

ValueError in Forward pass

FRAMES: torch.Size([16, 3, 224, 224])Batch 0:Prompts: 1Videos shape: torch.Size([1, 16, 3, 224, 224])Input IDs shape: torch.Size([1, 350])Attention Mask shape: torch.Size([1, 350])Training Epoch 0: 0%| | 0/4 [00:03<?, ?it/s]

FRAMES: torch.Size([16, 3, 224, 224])
Batch 0:
Prompts: 1
Videos shape: torch.Size([1, 16, 3, 224, 224])
Input IDs shape: torch.Size([1, 350])
Attention Mask shape: torch.Size([1, 350])
Training Epoch 0: 0%| | 0/4 [00:03<?, ?it/s]