ibm-research/granite-vision-3.2-2b-GGUF

This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model.

Please reference the base model's full model card here: https://huggingface.co./ibm-granite/granite-vision-3.2-2b

Model Summary: granite-vision-3.2-2b is a compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more. The model was trained on a meticulously curated instruction-following dataset, comprising diverse public datasets and synthetic datasets tailored to support a wide range of document understanding and general image tasks. It was trained by fine-tuning a Granite large language model with both image and text modalities.

Paper: Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Release Date: Feb 26th, 2025
License: Apache 2.0

Supported Input Format: Currently the model supports English instructions and images (png, jpeg, etc.) as input format.

Intended Use: The model is intended to be used in enterprise applications that involve processing visual and text data. In particular, the model is well-suited for a range of visual document understanding tasks, such as analyzing tables and charts, performing optical character recognition (OCR), and answering questions based on document content. Additionally, its capabilities extend to general image understanding, enabling it to be applied to a broader range of business applications. For tasks that exclusively involve text-based input, we suggest using our Granite large language models, which are optimized for text-only processing and offer superior performance compared to this model.

ibm-research
/

granite-vision-3.2-2b-GGUF

Model tree for ibm-research/granite-vision-3.2-2b-GGUF

Collection including ibm-research/granite-vision-3.2-2b-GGUF

Granite 3.2 Models (GGUF)