DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Paper • 2111.14690 • Published Nov 29, 2021
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation Paper • 2406.09399 • Published Jun 13, 2024
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals Paper • 2011.12450 • Published Nov 25, 2020
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper • 2412.03069 • Published Dec 4, 2024 • 32
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published Dec 5, 2024 • 18
Liquid: Language Models are Scalable Multi-modal Generators Paper • 2412.04332 • Published Dec 5, 2024 • 2
Goku: Flow Based Video Generative Foundation Models Paper • 2502.04896 • Published about 1 month ago • 96
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published about 1 month ago • 24
Language as Queries for Referring Video Object Segmentation Paper • 2201.00487 • Published Jan 3, 2022
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 10 days ago • 28
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 10 days ago • 28 • 2
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 10 days ago • 28