genmo/mochi-1-preview · Fail to run on two A10Gs

Dec 15, 2024

Hi! The model failed to run on two A10Gs, is there any way to run it on two A10Gs? Thanks!

Error:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 522.00 MiB. GPU 0 has a total capacity of 22.18 GiB of which 328.69 MiB is free. Process 293155 has 21.86 GiB memory in use. Of the allocated memory 20.74 GiB is allocated by PyTorch, and 858.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Code:

 import torch
    from diffusers import MochiPipeline
    from diffusers.utils import export_to_video

    pipe = MochiPipeline.from_pretrained("genmo/mochi-1-preview", variant="bf16", torch_dtype=torch.bfloat16)

    # Enable memory savings
    pipe.enable_model_cpu_offload()
    pipe.enable_vae_tiling()

    prompt = "Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k."
    frames = pipe(prompt, num_frames=84).frames[0]

    export_to_video(frames, "mochi.mp4", fps=30)

yNilay

Dec 16, 2024

@ved-genmo Can you please suggest what to do?

ved-genmo

Genmo org Dec 18, 2024

Can you try using the official Mochi API, instead of the diffusers API? https://github.com/genmoai/mochi
There's a cli.py script in demos that should automatically shard the model on multiple GPUs.