96 60 15

Amy Roberts

amyeroberts

AI & ML interests

None yet

Articles

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Mar 20

• 26

Object Detection Leaderboard

Sep 18, 2023

• 8

Organizations

amyeroberts's activity

upvoted a collection 8 months ago

Idefics2 🐶

Collection

Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 89

upvoted a collection 9 months ago

LLaVa-NeXT

Collection

LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19 • 27

upvoted 3 papers 11 months ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5 • 72

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Paper • 2402.03161 • Published Feb 5 • 14

V-IRL: Grounding Virtual Intelligence in Real Life

Paper • 2402.03310 • Published Feb 5 • 15

upvoted a collection 11 months ago

Canonical models

Collection

This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace • 68 items • Updated Feb 13 • 14

upvoted 8 papers 12 months ago

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Paper • 2401.05252 • Published Jan 10 • 47

What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

Paper • 2401.02411 • Published Jan 4 • 13

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3 • 31

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5 • 21

upvoted 6 papers about 1 year ago

Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 34

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Paper • 2312.13252 • Published Dec 20, 2023 • 27

Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 11

Pixel Aligned Language Models

Paper • 2312.09237 • Published Dec 14, 2023 • 14

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 16

VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 22