2 Dots

2dts
Β·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

2dts's activity

New activity in deepseek-ai/DeepSeek-R1-Distill-Llama-8B 22 days ago

How was this quantized?

1
#3 opened 26 days ago by
imq
reacted to nateraw's post with πŸ”₯ 10 months ago
view post
Post
4502
Turns out if you do a cute little hack, you can make nateraw/musicgen-songstarter-v0.2 work on vocal inputs. πŸ‘€

Now, you can hum an idea for a song and get a music sample generated with AI πŸ”₯πŸ”₯

Give it a try: ➑️ nateraw/singing-songstarter ⬅️

It'll take your voice and try to autotune it (because let's be real, you're no michael jackson), then pass it along to the model to condition on the melody. It works surprisingly well!
reacted to qnguyen3's post with πŸ”₯β€οΈπŸš€ 10 months ago
view post
Post
5546
πŸŽ‰ Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. πŸš€ This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. πŸ“±πŸ’»

Model: qnguyen3/nanoLLaVA πŸ”
Spaces: qnguyen3/nanoLLaVA (thanks to @merve )

Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. 🧠 The model is trained using a data-centric approach to ensure optimal performance. πŸ“Š

In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. 🀝
  • 1 reply
Β·
reacted to DmitryRyumin's post with πŸ”₯ 11 months ago
view post
Post
πŸš€πŸ’ƒπŸ»πŸŒŸ New Research Alert - CVPR 2024! πŸŒŸπŸ•Ί πŸš€
πŸ“„ Title: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling πŸŒŸπŸš€

πŸ“ Description: Animatable Gaussians - a novel method for creating lifelike human avatars from RGB videos, utilizing 2D CNNs and 3D Gaussian splatting to capture pose-dependent garment details and dynamic appearances with high fidelity.

πŸ‘₯ Authors: Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu

πŸ“… Conference: CVPR, Jun 17-21, 2024 | Seattle WA, USA πŸ‡ΊπŸ‡Έ

πŸ”— Paper: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling (2311.16096)

🌐 Github Page: https://animatable-gaussians.github.io
πŸ“ Repository: https://github.com/lizhe00/AnimatableGaussians

πŸ“Ί Video: https://www.youtube.com/watch?v=kOmZxD0HxZI

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

πŸ” Keywords: #AnimatableGaussians #HumanAvatars #3DGaussianSplatting #CVPR2024 #DeepLearning #Animation #Innovation
New activity in gorilla-llm/gorilla-openfunctions-v2 12 months ago
reacted to vladbogo's post with πŸ‘ 12 months ago
view post
Post
Genie is a new method from Google DeepMind that generates interactive, action-controllable virtual worlds from unlabelled internet videos using.

Keypoints:
* Genie leverages a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model to generate controllable video environments.
* The model is trained on video data alone, without requiring action labels, using unsupervised learning to infer latent actions between frames.
* The method restricts the size of the action vocabulary to 8 to ensure that the number of possible latent actions remains small.
* The dataset used for training is generated by filtering publicly available internet videos with specific criteria related to 2D platformer games for a total of 6.8M videos used for training.

Paper: Genie: Generative Interactive Environments (2402.15391)
Project page: https://sites.google.com/view/genie-2024/
More detailed overview in my blog: https://huggingface.co./blog/vladbogo/genie-generative-interactive-environments

Congrats to the authors for their work!
Β·