Young-Jun Lee's picture

45 17

Young-Jun Lee

passing2961

·

https://sites.google.com/view/passing2961/home

AI & ML interests

Social Dialogue System, Multi-Modal Dialogue

Organizations

None yet

passing2961's activity

upvoted 2 papers 23 days ago

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published 25 days ago • 56

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published 25 days ago • 92

upvoted a paper 24 days ago

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published 25 days ago • 81

upvoted a paper 27 days ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 110

upvoted 5 papers 30 days ago

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs

Paper • 2408.11813 • Published Aug 21 • 10

SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models

Paper • 2408.12114 • Published Aug 22 • 11

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22 • 29

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22 • 33

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22 • 50

upvoted 21 papers about 1 month ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 53

TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models

Paper • 2408.11318 • Published Aug 21 • 54

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge

Paper • 2407.03958 • Published Jul 4 • 18

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15 • 44

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 96

Towards flexible perception with visual memory

Paper • 2408.08172 • Published Aug 15 • 19

Generative Photomontage

Paper • 2408.07116 • Published Aug 13 • 19

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13 • 65

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 60

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

Paper • 2408.04810 • Published Aug 9 • 22

ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

Paper • 2408.04682 • Published Aug 8 • 14

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Paper • 2408.04840 • Published Aug 9 • 31

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Paper • 2408.05147 • Published Aug 9 • 36

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9 • 46

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 152

EXAONE 3.0 7.8B Instruction Tuned Language Model

Paper • 2408.03541 • Published Aug 7 • 32

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Paper • 2408.02657 • Published Aug 5 • 32

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 74

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

upvoted 3 papers about 2 months ago

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 102

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 103

upvoted 7 papers 2 months ago

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18 • 16

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

Paper • 2407.12854 • Published Jul 9 • 29

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Paper • 2407.12327 • Published Jul 17 • 75

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10 • 40

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 64

Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck

Paper • 2204.02735 • Published Apr 6, 2022 • 4

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 80

upvoted 3 papers 3 months ago

DialogCC: Large-Scale Multi-Modal Dialogue Dataset

Paper • 2212.04119 • Published Dec 8, 2022 • 1

Large Language Models can Share Images, Too!

Paper • 2310.14804 • Published Oct 23, 2023 • 1

TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18 • 34

upvoted a paper 4 months ago

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 52

upvoted a paper 7 months ago

CoLLaVO: Crayon Large Language and Vision mOdel

Paper • 2402.11248 • Published Feb 17 • 18