Google/paligemma2-3b-pt-896 model fine-tuned for US IRS Form 1040 (2023) data parsing and extraction

The repository only provides Peft LORA weights. The lora layers have been fine-tuned to to parse and extract data from IRS (US) tax form 1040 (year 2023) first page only. It performs OCR and returns extracted data in JSON format using zero shot prompt.



from PIL import Image
import torch
import json

from transformers import PaliGemmaForConditionalGeneration, AutoProcessor
from peft import PeftModel


model_id = 'google/paligemma-3b-pt-896'
peft_model_id = 'hsarfraz/google-paligemma-irs-form-1040-2023-parser-pg1'

device = "cuda:0" if torch.cuda.is_available() else "cpu"

# load base model 
processor = AutoProcessor.from_pretrained(model_id,padding_side = "right",add_eos_token = True)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, device_map={"":0}, torch_dtype=torch.bfloat16)

# load fine-tuned peft weights
fine_tuned_model = PeftModel.from_pretrained(model, peft_model_id)
fine_tuned_model.to(device)

# prompt for OCR
prompt = "<image>extract data in JSON format"

# path to local image file
image_file = '<replace with path to input image>'
image = Image.open(image_file)

# get tokens
inputs = processor(images=image, text=prompt, return_tensors="pt").to(device)
prefix_length = inputs["input_ids"].shape[-1] 

#switch to inference mode
with torch.inference_mode():       
    generation = fine_tuned_model.generate(**inputs, max_new_tokens=1152)
    generation = generation[0][prefix_length:]
    decoded = processor.decode(generation, skip_special_tokens=True)
    
    # parse output as json 
    try:
        output_json =json.dumps(json.loads(decoded), indent=4) 
    except (Exception) as error:
        print('Error: %s' % error)
        output_json = decoded 
    
    # display parsed json
    print(output_json)

Fake Synthetic Data for IRS 1040 2023 Form Page 1

![image]:(https://huggingface.co./hsarfraz/irs-tax-form-1040-2023-doc-parser/blob/main/fake_synthetic_form_1040_example.png)

hsarfraz
/

google-paligemma-irs-form-1040-2023-parser-pg1

You need to agree to share your contact information to access this model

Google/paligemma2-3b-pt-896 model fine-tuned for US IRS Form 1040 (2023) data parsing and extraction

Fake Synthetic Data for IRS 1040 2023 Form Page 1

Model tree for hsarfraz/google-paligemma-irs-form-1040-2023-parser-pg1