Voice conversion framework based on VITS
Generate depth maps from images
Generate a 3D mesh model from an image