X-Decoder

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

jw2yang authored a paper 8 days ago

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

jw2yang authored a paper 10 days ago

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

jw2yang authored a paper 20 days ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

View all activity

xdecoder's activity

jw2yang

authored a paper 8 days ago

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Paper • 2412.10345 • Published 12 days ago • 2

jw2yang

authored a paper 10 days ago

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published 13 days ago • 10

jw2yang

authored a paper 20 days ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published 20 days ago • 55

xueyanz

authored a paper about 1 month ago

WildLMa: Long Horizon Loco-Manipulation in the Wild

Paper • 2411.15131 • Published Nov 22 • 6

jw2yang

authored a paper 2 months ago

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Paper • 2410.10818 • Published Oct 14 • 15

jw2yang

authored a paper 5 months ago

OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1 • 24

jw2yang

authored a paper 7 months ago

Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 31

jw2yang

authored a paper 8 months ago

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25 • 16

xueyanz

updated 2 models 12 months ago

xdecoder/SEEM

Updated Dec 30, 2023 • 5

xdecoder/X-Decoder

Updated Dec 27, 2023 • 5

jw2yang

authored a paper about 1 year ago

Interfacing Foundation Models' Embeddings

Paper • 2312.07532 • Published Dec 12, 2023 • 10

xueyanz

authored a paper about 1 year ago

Interfacing Foundation Models' Embeddings

Paper • 2312.07532 • Published Dec 12, 2023 • 10

jw2yang

authored a paper about 1 year ago

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 11

xueyanz

authored 2 papers about 1 year ago

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 11

Visual In-Context Prompting

Paper • 2311.13601 • Published Nov 22, 2023 • 16

jw2yang

authored 2 papers about 1 year ago

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 13

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 48

xueyanz

authored a paper about 1 year ago

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 48

jw2yang

authored a paper about 1 year ago

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 41

xueyanz

authored a paper about 1 year ago

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Paper • 2310.11441 • Published Oct 17, 2023 • 26

AI & ML interests

Recent Activity

Team members 5

xdecoder's activity