Papers
arxiv:2411.02657

Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge

Published on Nov 4
· Submitted by ksoman on Nov 6
Authors:
,
,
,
,
,
,

Abstract

Rare diseases present unique challenges in healthcare, often suffering from delayed diagnosis and fragmented information landscapes. The scarcity of reliable knowledge in these conditions poses a distinct challenge for Large Language Models (LLMs) in supporting clinical management and delivering precise patient information underscoring the need for focused training on these 'zebra' cases. We present Zebra-Llama, a specialized context-aware language model with high precision Retrieval Augmented Generation (RAG) capability, focusing on Ehlers-Danlos Syndrome (EDS) as our case study. EDS, affecting 1 in 5,000 individuals, exemplifies the complexities of rare diseases with its diverse symptoms, multiple subtypes, and evolving diagnostic criteria. By implementing a novel context-aware fine-tuning methodology trained on questions derived from medical literature, patient experiences, and clinical resources, along with expertly curated responses, Zebra-Llama demonstrates unprecedented capabilities in handling EDS-related queries. On a test set of real-world questions collected from EDS patients and clinicians, medical experts evaluated the responses generated by both models, revealing Zebra-Llama's substantial improvements over base model (Llama 3.1-8B-Instruct) in thoroughness (77.5% vs. 70.1%), accuracy (83.0% vs. 78.8%), clarity (74.7% vs. 72.0%) and citation reliability (70.6% vs. 52.3%). Released as an open-source resource, Zebra-Llama not only provides more accessible and reliable EDS information but also establishes a framework for developing specialized AI solutions for other rare conditions. This work represents a crucial step towards democratizing expert-level knowledge in rare disease management, potentially transforming how healthcare providers and patients navigate the complex landscape of rare diseases.

Community

Paper author Paper submitter

Title:
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge

TL;DR:
We present Zebra-Llama, an open-source specialized LLM with enhanced context-aware RAG capabilities for rare disease Ehlers-Danlos Syndrome (EDS), demonstrating significant improvements in accuracy, thoroughness, and citation reliability on real-world patient and clinician queries.

Key Points:

  • Novel context-aware fine-tuning methodology optimized for rare disease knowledge management
  • Evaluated by medical experts on real-world EDS questions with superior performance metrics
  • Open-source model with high-precision RAG capabilities and robust citation accuracy
  • Potential framework for developing specialized AI solutions for other rare diseases

Resources:

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.02657 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.02657 in a Space README.md to link it from this page.

Collections including this paper 3