arxiv:2410.02458

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

Published on Oct 3, 2024

· Submitted by

amanchadha on Oct 4, 2024

Upvote

Authors:

Gurucharan Marthi Krishna Kumar ,

Aman Chadha ,

Abstract

Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs

View arXiv page View PDF Add to collection

Community

amanchadha

Paper author Paper submitter Oct 4, 2024

The paper presents a novel method of enhancing Vision Transformer (ViT)-based medical image segmentation models by integrating pre-trained frozen transformer blocks from Large Language Models (LLMs), significantly improving segmentation performance across various medical imaging modalities.

Frozen LLM Transformer Integration: Introduces a pre-trained, frozen transformer block from LLMs into the encoder of a ViT model, resulting in substantial performance improvements in medical image segmentation.
Hybrid Attention and Multi-Scale Fusion: Proposes a Hybrid Attention Mechanism combining global and local feature learning, alongside a Multi-Scale Fusion Block to aggregate features across scales, enhancing segmentation precision.
Extensive Evaluation: Demonstrates effectiveness across 10 medical imaging modalities, achieving higher accuracy, precision, and Dice scores, with thorough ablation studies confirming the advantages of the LLM-based approach.

nielsr

Oct 6, 2024

Hi @amanchadha congrats on this work!

Are you planning to share the pre-trained model on the hub? See here for a guide: https://huggingface.co./docs/hub/models-uploading.

Also, would be great to link it to this paper, by including https://huggingface.co./papers/2410.02458 in the model card.

Let us know if you need any help!

Cheers,
Niels

librarian-bot

Oct 5, 2024

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.02458 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.02458 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.02458 in a Space README.md to link it from this page.