An end-to-end (e2e) Voice Language Model by Fish Audio.
Generate realistic talking heads from image+audio
Image generator/identifier/reposer