It'll take your voice and try to autotune it (because let's be real, you're no michael jackson), then pass it along to the model to condition on the melody. It works surprisingly well!
π Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. π This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. π±π»
Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. π§ The model is trained using a data-centric approach to ensure optimal performance. π
In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. π€
πππ»π New Research Alert - CVPR 2024! ππΊ π π Title: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling ππ
π Description: Animatable Gaussians - a novel method for creating lifelike human avatars from RGB videos, utilizing 2D CNNs and 3D Gaussian splatting to capture pose-dependent garment details and dynamic appearances with high fidelity.
π₯ Authors: Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu
π Conference: CVPR, Jun 17-21, 2024 | Seattle WA, USA πΊπΈ
Genie is a new method from Google DeepMind that generates interactive, action-controllable virtual worlds from unlabelled internet videos using.
Keypoints: * Genie leverages a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model to generate controllable video environments. * The model is trained on video data alone, without requiring action labels, using unsupervised learning to infer latent actions between frames. * The method restricts the size of the action vocabulary to 8 to ensure that the number of possible latent actions remains small. * The dataset used for training is generated by filtering publicly available internet videos with specific criteria related to 2D platformer games for a total of 6.8M videos used for training.