Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.08872

Multi-modality LVM

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published Jun 18 • 29
TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18 • 34
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

Paper • 2406.15334 • Published Jun 21 • 8
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Paper • 2406.12742 • Published Jun 18 • 14

Work for 3D Medical Vision

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published Jun 18 • 29
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published May 24 • 43
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 97

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Paper • 2406.08552 • Published Jun 12 • 22
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 97

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 12
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 31

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 4

Papers - Salesforce

Non-Autoregressive Neural Machine Translation

Paper • 1711.02281 • Published Nov 7, 2017 • 1
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Paper • 2107.07651 • Published Jul 16, 2021 • 1
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 46
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23 • 13

Papers - University - University of Washington

The Curious Case of Neural Text Degeneration

Paper • 1904.09751 • Published Apr 22, 2019 • 3
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 30
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Paper • 1905.10044 • Published May 24, 2019 • 1
PIQA: Reasoning about Physical Commonsense in Natural Language

Paper • 1911.11641 • Published Nov 26, 2019 • 2

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 48
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 39
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Paper • 2403.06775 • Published Mar 11 • 3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 6
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition

Paper • 2110.07040 • Published Oct 13, 2021 • 2
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks

Paper • 1811.00056 • Published Oct 31, 2018 • 2

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22 • 19
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 3
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs