LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 29 days ago • 48
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 29 days ago • 48
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 56
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 56
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 56
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Paper • 2406.03049 • Published Jun 5, 2024 • 1
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? Paper • 2406.07289 • Published Jun 11, 2024 • 1
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 56