Doğuş Can Korkmaz

doguscank

AI & ML interests

Vision, LLMs, vLLMs, semantic segmentation, forecasting

Recent Activity

upvoted a paper 2 days ago

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

updated a collection 3 days ago

to read

updated a collection 3 days ago

to read

View all activity

Organizations

None yet

doguscank's activity

upvoted a paper 2 days ago

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper • 2412.15322 • Published 6 days ago • 16

upvoted a paper 3 days ago

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published 6 days ago • 46

upvoted a paper 9 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 12 days ago • 131

upvoted 2 papers 10 days ago

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

Paper • 2412.09611 • Published 13 days ago • 9

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published 13 days ago • 84

upvoted 6 papers 15 days ago

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published 20 days ago • 11

Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published 15 days ago • 17

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Paper • 2412.07774 • Published 15 days ago • 25

upvoted a paper 17 days ago

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Paper • 2412.05263 • Published 19 days ago • 10

upvoted a collection 24 days ago

AIMv2

Collection

A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22 • 67

upvoted 3 papers 29 days ago

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Paper • 2411.15411 • Published Nov 23 • 7

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Paper • 2411.16856 • Published about 1 month ago • 11

TEXGen: a Generative Diffusion Model for Mesh Textures

Paper • 2411.14740 • Published Nov 22 • 15

upvoted 3 papers 30 days ago

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Paper • 2411.15466 • Published Nov 23 • 34

Material Anything: Generating Materials for Any 3D Object via Diffusion

Paper • 2411.15138 • Published Nov 22 • 42

One Diffusion to Generate Them All

Paper • 2411.16318 • Published about 1 month ago • 26