GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Abstract
Large multimodal models (LMMs) have exhibited proficiencies across many visual tasks. Although numerous well-known benchmarks exist to evaluate model performance, they increasingly have insufficient headroom. As such, there is a pressing need for a new generation of benchmarks challenging enough for the next generation of LMMs. One area that LMMs show potential is graph analysis, specifically, the tasks an analyst might typically perform when interpreting figures such as estimating the mean, intercepts or correlations of functions and data series. In this work, we introduce GRAB, a graph analysis benchmark, fit for current and future frontier LMMs. Our benchmark is entirely synthetic, ensuring high-quality, noise-free questions. GRAB is comprised of 2170 questions, covering four tasks and 23 graph properties. We evaluate 20 LMMs on GRAB, finding it to be a challenging benchmark, with the highest performing model attaining a score of just 21.7%. Finally, we conduct various ablations to investigate where the models succeed and struggle. We release GRAB to encourage progress in this important, growing domain.
Community
Project page: https://grab-benchmark.github.io/
Dataset available at: https://huggingface.co./datasets/jonathan-roberts1/GRAB
Code available at: https://github.com/jonathan-roberts1/GRAB
Authors are happy to answers any questions
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation (2024)
- Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness (2024)
- MIBench: Evaluating Multimodal Large Language Models over Multiple Images (2024)
- LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models (2024)
- Synthetic Multimodal Question Generation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper