--- license: mit datasets: - liuhaotian/LLaVA-Instruct-150K - LanguageBind/Video-LLaVA language: - en metrics: - accuracy pipeline_tag: image-text-to-text library_name: transformers --- # LSTP-Chat: Language-guided Spatial-Temporal Prompt Learning for Video Chat Available Models: - LSTP-FlanT5xl - LSTP-Chat-7B (Vicuna-7b) For more details, please refer to our [official repository](https://github.com/bigai-nlco/LSTP-Chat)