to_read - a hitchhiker3010 Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

hitchhiker3010 's Collections

AI Ads

Agent First world

Agent Personalization

to_read

to_read

updated 5 days ago

DocGraphLM: Documental Graph Language Model for Information Extraction

Paper • 2401.02823 • Published Jan 5, 2024 • 36
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4, 2024 • 64
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 180
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 126
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Paper • 2403.13447 • Published Mar 20, 2024 • 18
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 120
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 40
Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 11
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

Paper • 2406.15334 • Published Jun 21, 2024 • 9
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Paper • 2407.07071 • Published Jul 9, 2024 • 12
Transformer Layers as Painters

Paper • 2407.09298 • Published Jul 12, 2024 • 15
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Paper • 2402.03216 • Published Feb 5, 2024 • 5
Visual Text Generation in the Wild

Paper • 2407.14138 • Published Jul 19, 2024 • 9
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17, 2024 • 19
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12, 2024 • 119
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 126
Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 140
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning

Paper • 2307.03692 • Published Jul 5, 2023 • 26
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Paper • 2307.06304 • Published Jul 12, 2023 • 30
Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published Oct 3, 2024 • 34
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Paper • 2410.02762 • Published Oct 3, 2024 • 9
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published Oct 3, 2024 • 48
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

Paper • 2410.01731 • Published Oct 2, 2024 • 17
Contextual Document Embeddings

Paper • 2410.02525 • Published Oct 3, 2024 • 21
Compact Language Models via Pruning and Knowledge Distillation

Paper • 2407.14679 • Published Jul 19, 2024 • 39
LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

Paper • 2401.08500 • Published Jan 16, 2024 • 5
Automatic Prompt Optimization with "Gradient Descent" and Beam Search

Paper • 2305.03495 • Published May 4, 2023 • 1
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Paper • 2312.10003 • Published Dec 15, 2023 • 40
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 89
Personalization of Large Language Models: A Survey

Paper • 2411.00027 • Published Oct 29, 2024 • 32
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

Paper • 2411.00412 • Published Nov 1, 2024 • 10
Human-inspired Perspectives: A Survey on AI Long-term Memory

Paper • 2411.00489 • Published Nov 1, 2024 • 1
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 115
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 17
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 97
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 51
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Paper • 2411.07232 • Published Nov 11, 2024 • 65
MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published Nov 14, 2024 • 68
Distilling System 2 into System 1

Paper • 2407.06023 • Published Jul 8, 2024 • 3
Altogether: Image Captioning via Re-aligning Alt-text

Paper • 2410.17251 • Published Oct 22, 2024
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents

Paper • 2409.15594 • Published Sep 23, 2024
Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 43
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 129
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 61
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 80
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 22
Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 111
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Paper • 2410.18779 • Published Oct 24, 2024 • 1
Asynchronous LLM Function Calling

Paper • 2412.07017 • Published Dec 9, 2024 • 1
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 134
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published Dec 18, 2024 • 51
GUI Agents: A Survey

Paper • 2412.13501 • Published Dec 18, 2024 • 25
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 13
Alignment faking in large language models

Paper • 2412.14093 • Published Dec 18, 2024 • 7
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Paper • 2402.14207 • Published Feb 22, 2024 • 8
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

Paper • 2408.15232 • Published Aug 27, 2024
Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

Paper • 2412.06771 • Published Dec 9, 2024
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 260
Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8 • 86
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 91
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Paper • 2501.05707 • Published Jan 10 • 20
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 21
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 78
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 43
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

Paper • 2501.08225 • Published Jan 14 • 18
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

Paper • 2404.12272 • Published Apr 18, 2024 • 1
Do generative video models learn physical principles from watching videos?

Paper • 2501.09038 • Published Jan 14 • 32
AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

Paper • 2501.09503 • Published Jan 16 • 13
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 106
GameFactory: Creating New Games with Generative Interactive Videos

Paper • 2501.08325 • Published Jan 14 • 64
Playable Game Generation

Paper • 2412.00887 • Published Dec 1, 2024
UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published Jan 21 • 53
A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published Apr 22, 2024 • 21
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published 19 days ago • 55
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models

Paper • 2502.12464 • Published 20 days ago • 27
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

Paper • 2502.07870 • Published 26 days ago • 43
History-Guided Video Diffusion

Paper • 2502.06764 • Published 27 days ago • 12

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs