Inference error probably related to bf16 quantization

#2
by LeafXR - opened

I got a problem when I try to use this model for inference. I ran the inference service in a GPU of A10. Below is the error stack

s/py_inference/serving/module_executor.py:exec_json():87][WARNING] exception: Traceback (most recent call last):
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/python_worker/python_user_base/lib/python3.8/site-packages/py_inference/serving/module_executor.py", line 81, in exec_json
result = self.do_exec(args_json)
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/python_worker/python_user_base/lib/python3.8/site-packages/py_inference/serving/module_executor.py", line 98, in do_exec
result = self.inference_worker.inference(**args)
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/binary/lzd_llm_inference/llm_video_understanding.py", line 83, in inference
response_text = self.inference_service.generate(input_text=text, video=video)[0]
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/binary/lzd_llm_inference/llm_video_understanding.py", line 71, in generate
outputs = self.base_model.generate(**request, max_new_tokens=max_new_tokens,
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/python_worker/python_user_base/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/binary/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py", line 1752, in generate
outputs = self.language_model.generate(
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/python_worker/python_user_base/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/binary/conda_new/lib/python3.8/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/home/admin/hippo/worker/slave/suezops_c2_prod_mplug_owl_video.mplug_owl_video_15_31/binary/conda_new/lib/python3.8/site-packages/transformers/generation/utils.py", line 2560, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

Hello, how should I use this model, can you give some code examples? thank you

Sign up or log in to comment