caT text to video

Conditionally augmented text-to-video model. Uses pre-trained weights from modelscope text-to-video model, augmented with temporal conditioning transformers to extend generated clips and create a smooth transition between them. Supports prompt interpolation as well to change scenes during clip extensions.

This model was trained at home as a hobby.

Do not expect high quality samples.

Installation

Clone the Repository

git clone https://github.com/motexture/caT-text-to-video.git
cd caT-text-to-video
python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
python3 run.py

Visit the provided URL in your browser to interact with the interface and start generating videos.

Note: Ensure that you are on the latest commit, as the positional encodings have been updated compared to the initial models.

motexture
/

caT-text-to-video

caT text to video

Installation

Clone the Repository

Model tree for motexture/caT-text-to-video

Dataset used to train motexture/caT-text-to-video