Google/paligemma2-3b-pt-896 model fine-tuned for US IRS Form 1040 (2023) data parsing and extraction
The repository only provides Peft LORA weights. The lora layers have been fine-tuned to to parse and extract data from IRS (US) tax form 1040 (year 2023) first page only. It performs OCR and returns extracted data in JSON format using zero shot prompt.
from PIL import Image
import torch
import json
from transformers import PaliGemmaForConditionalGeneration, AutoProcessor
from peft import PeftModel
model_id = 'google/paligemma-3b-pt-896'
peft_model_id = 'hsarfraz/google-paligemma-irs-form-1040-2023-parser-pg1'
device = "cuda:0" if torch.cuda.is_available() else "cpu"
# load base model
processor = AutoProcessor.from_pretrained(model_id,padding_side = "right",add_eos_token = True)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, device_map={"":0}, torch_dtype=torch.bfloat16)
# load fine-tuned peft weights
fine_tuned_model = PeftModel.from_pretrained(model, peft_model_id)
fine_tuned_model.to(device)
# prompt for OCR
prompt = "<image>extract data in JSON format"
# path to local image file
image_file = '<replace with path to input image>'
image = Image.open(image_file)
# get tokens
inputs = processor(images=image, text=prompt, return_tensors="pt").to(device)
prefix_length = inputs["input_ids"].shape[-1]
#switch to inference mode
with torch.inference_mode():
generation = fine_tuned_model.generate(**inputs, max_new_tokens=1152)
generation = generation[0][prefix_length:]
decoded = processor.decode(generation, skip_special_tokens=True)
# parse output as json
try:
output_json =json.dumps(json.loads(decoded), indent=4)
except (Exception) as error:
print('Error: %s' % error)
output_json = decoded
# display parsed json
print(output_json)
Fake Synthetic Data for IRS 1040 2023 Form Page 1
Model tree for hsarfraz/google-paligemma-irs-form-1040-2023-parser-pg1
Base model
google/paligemma2-3b-pt-896