Turns out if you do a cute little hack, you can make nateraw/musicgen-songstarter-v0.2 work on vocal inputs. 👀

Now, you can hum an idea for a song and get a music sample generated with AI 🔥🔥

Give it a try: ➡️ nateraw/singing-songstarter ⬅️

It'll take your voice and try to autotune it (because let's be real, you're no michael jackson), then pass it along to the model to condition on the melody. It works surprisingly well!

liked 2 Spaces 10 months ago

129

Automatic Speech Recognition

🌍

Convert and recognize speech from audio

XTTS_V2 work on CPU Can duplicate

🚀

liked 2 models 10 months ago

coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 2.55M • 2.37k

jonatasgrosman/wav2vec2-large-xlsr-53-russian

Automatic Speech Recognition • Updated Dec 14, 2022 • 4.57M • 49

reacted to qnguyen3's post with 🔥❤️🚀 10 months ago

Post

5546

🎉 Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. 🚀 This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. 📱💻

Model: qnguyen3/nanoLLaVA 🔍
Spaces: qnguyen3/nanoLLaVA (thanks to @merve )

Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. 🧠 The model is trained using a data-centric approach to ensure optimal performance. 📊

In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. 🤝

1 reply

liked 2 models 11 months ago

Contamination/contaminated_proof_7b_v1.0_safetensor

Text Generation • Updated Apr 2, 2024 • 85 • 12

PixArt-alpha/PixArt-Sigma

Updated Apr 22, 2024 • 95

reacted to DmitryRyumin's post with 🔥 11 months ago

Post

🚀💃🏻🌟 New Research Alert - CVPR 2024! 🌟🕺 🚀
📄 Title: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling 🌟🚀

📝 Description: Animatable Gaussians - a novel method for creating lifelike human avatars from RGB videos, utilizing 2D CNNs and 3D Gaussian splatting to capture pose-dependent garment details and dynamic appearances with high fidelity.

👥 Authors: Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu

📅 Conference: CVPR, Jun 17-21, 2024 | Seattle WA, USA 🇺🇸

🔗 Paper: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling (2311.16096)

🌐 Github Page: https://animatable-gaussians.github.io
📁 Repository: https://github.com/lizhe00/AnimatableGaussians

📺 Video: https://www.youtube.com/watch?v=kOmZxD0HxZI

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: #AnimatableGaussians #HumanAvatars #3DGaussianSplatting #CVPR2024 #DeepLearning #Animation #Innovation

liked 2 models 11 months ago

NousResearch/Hermes-2-Pro-Mistral-7B

Text Generation • Updated Sep 8, 2024 • 15.6k • 490

MTSAIR/multi_verse_model

Text Generation • Updated Mar 10, 2024 • 6.17k • 35

liked a model 12 months ago

gorilla-llm/gorilla-openfunctions-v2

Text Generation • Updated Apr 18, 2024 • 486 • 222

New activity in gorilla-llm/gorilla-openfunctions-v2 12 months ago

parameters suggestions (temperature etc)

#6 opened 12 months ago by

2dts

what is openfunctions_utils

#5 opened 12 months ago by

2dts

liked a model 12 months ago

openai/consistency-decoder

Updated Nov 9, 2023 • 235 • 49

reacted to vladbogo's post with 👍 12 months ago

Post

Genie is a new method from Google DeepMind that generates interactive, action-controllable virtual worlds from unlabelled internet videos using.

Keypoints:
* Genie leverages a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model to generate controllable video environments.
* The model is trained on video data alone, without requiring action labels, using unsupervised learning to infer latent actions between frames.
* The method restricts the size of the action vocabulary to 8 to ensure that the number of possible latent actions remains small.
* The dataset used for training is generated by filtering publicly available internet videos with specific criteria related to 2D platformer games for a total of 6.8M videos used for training.

Paper: Genie: Generative Interactive Environments (2402.15391)
Project page: https://sites.google.com/view/genie-2024/
More detailed overview in my blog: https://huggingface.co./blog/vladbogo/genie-generative-interactive-environments

Congrats to the authors for their work!

3 replies