Generate realistic talking heads from image+audio
FitDiT is a high-fidelity virtual try-on model.
https://huggingface.co./papers/2501.03006
Audio Conditioned LipSync with Latent Diffusion Models
InstantID-XS
Text to Audio (Sound SFX) Generator
Generate images with Switti