Furkan Gözükara

MonsterMMORPG

AI & ML interests

Check out my youtube page SECourses for Stable Diffusion tutorials. They will help you tremendously in every topic

Recent Activity

updated a model about 7 hours ago
MonsterMMORPG/sefer_duz_maps
updated a model about 10 hours ago
MonsterMMORPG/Lecture_Notes
new activity 1 day ago
MonsterMMORPG/Generative-AI:images
View all activity

Articles

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

MonsterMMORPG's activity

reacted to their post with 🤯🤝👍🧠😎🤗❤️👀🚀🔥 1 day ago
view post
Post
1238
Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model

Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE

1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192

https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV

Official Hugging Face repo of CogVideoX1.5-5B-I2V : THUDM/CogVideoX1.5-5B-I2V

Official github repo : https://github.com/THUDM/CogVideo

Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05

Demo images shared in : https://www.patreon.com/posts/112848192

I used 1360x768px images at 16 FPS and 81 frames = 5 seconds

+1 frame coming from initial image

Also I have enabled all the optimizations shared on Hugging Face

pipe.enable_sequential_cpu_offload()

pipe.vae.enable_slicing()

pipe.vae.enable_tiling()

quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV

Used audio model : https://github.com/hkchengrex/MMAudio

1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364

https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV

Used very simple prompts - it fails when there is human in input video so use text to audio in such cases

I also tested some VRAM usages for CogVideoX1.5-5B-I2V

Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower

512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB

576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB

768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB

896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB




  • 1 reply
·
posted an update 1 day ago
view post
Post
1238
Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model

Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE

1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192

https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV

Official Hugging Face repo of CogVideoX1.5-5B-I2V : THUDM/CogVideoX1.5-5B-I2V

Official github repo : https://github.com/THUDM/CogVideo

Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05

Demo images shared in : https://www.patreon.com/posts/112848192

I used 1360x768px images at 16 FPS and 81 frames = 5 seconds

+1 frame coming from initial image

Also I have enabled all the optimizations shared on Hugging Face

pipe.enable_sequential_cpu_offload()

pipe.vae.enable_slicing()

pipe.vae.enable_tiling()

quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV

Used audio model : https://github.com/hkchengrex/MMAudio

1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364

https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV

Used very simple prompts - it fails when there is human in input video so use text to audio in such cases

I also tested some VRAM usages for CogVideoX1.5-5B-I2V

Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower

512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB

576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB

768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB

896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB




  • 1 reply
·
reacted to their post with 👍🤝🤯🧠😎🤗❤️ 17 days ago
view post
Post
2615
Simple prompt 2x latent upscaled FLUX - Fine Tuning / DreamBooth Images - Can be trained on as low as 6 GB GPUs - Each image 2048x2048 pixels

AI Photos of Yourself - Workflow Guide
Step 1: Initial Setup
Follow any standard FLUX Fine-Tuning / DreamBooth tutorial of your choice

You can also follow mine step by step : https://youtu.be/FvpWy1x5etM

Step 2: Data Collection
Gather high-quality photos of yourself

I used a Poco X6 Pro (mid-tier phone) with good results

Ensure good variety in poses and lighting

Step 3: Training
Use "ohwx man" as the only caption for all images

Keep it simple - no complex descriptions needed

Step 4: Testing & Optimization
Use SwarmUI grid to find the optimal checkpoint

Test different variations to find what works best

Step 5: Generation Settings
Upscale Parameters:

Scale: 2x

Refiner Control: 0.6

Model: RealESRGAN_x4plus.pth

Prompt Used:

photograph of ohwx man wearing an amazing ultra expensive suit on a luxury studio<segment:yolo-face_yolov9c.pt-1,0.7,0.5>photograph of ohwx man
Note: The model naturally generated smiling expressions since the training dataset included many smiling photos.

Note: yolo-face_yolov9c.pt used to mask face and auto inpaint face to improve distant shot face quality