Medical-Llama3-8B-GGUF

This is a fine-tuned version of the Llama3 8B model, specifically designed to answer medical questions. The model was trained on the AI Medical Chatbot dataset, which can be found at ruslanmv/ai-medical-chatbot. This fine-tuned model leverages the GGUF (General-Purpose Gradient-based Quantization with Uniform Forwarding) technique for efficient inference with 4-bit quantization.

Model: ruslanmv/Medical-Llama3-8B-GGUF

  • Developed by: ruslanmv
  • License: apache-2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B

Installation

Prerequisites:

  • A system with CUDA support is highly recommended for optimal performance.
  • Python 3.10 or later
  1. Install required Python libraries:
# GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
%%capture
!pip install huggingface-hub hf-transfer
  1. Download model quantized:
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
!huggingface-cli download \
 ruslanmv/Medical-Llama3-8B-GGUF \
  medical-llama3-8b.Q5_K_M.gguf \
 --local-dir . \
 --local-dir-use-symlinks False

MODEL_PATH="/content/medical-llama3-8b.Q5_K_M.gguf"

Example of use

Here's an example of how to use the Medical-Llama3-8B-GGUF 4bit model to generate an answer to a medical question:

from llama_cpp import Llama
import json
B_INST, E_INST = "<s>[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
You are an AI Medical Chatbot Assistant, I'm equipped with a wealth of medical knowledge derived from extensive datasets. I aim to provide comprehensive and informative responses to your inquiries. However, please note that while I strive for accuracy, my responses should not replace professional medical advice and short answers.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""
SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
def create_prompt(user_query):
   instruction = f"User asks: {user_query}\n"
   prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
   return prompt.strip()


user_query = "I'm a 35-year-old male experiencing symptoms like fatigue, increased sensitivity to cold, and dry, itchy skin. Could these be indicative of hypothyroidism?"
prompt = create_prompt(user_query)
print(prompt)

llm = Llama(model_path=MODEL_PATH, n_gpu_layers=-1)
result = llm(
   prompt=prompt,
   max_tokens=100,
   echo=False
)
print(result['choices'][0]['text'])

The output exmample

Hi, thank you for your query.
Hypothyroidism is characterized by fatigue, sensitivity to cold, weight gain, depression, hair loss and mental dullness. I would suggest that you get a complete blood count with thyroid profile including TSH (thyroid stimulating hormone), free thyroxine level, and anti-thyroglobulin antibodies. These tests will help in establishing the diagnosis of hypothyroidism.
If there is no family history of autoimmune disorders, then it might be due

License

This model is licensed under the Apache License 2.0. You can find the full license in the LICENSE file.

Downloads last month
89
GGUF
Model size
8.03B params
Architecture
llama

4-bit

5-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for ruslanmv/Medical-Llama3-8B-GGUF

Quantized
(240)
this model

Dataset used to train ruslanmv/Medical-Llama3-8B-GGUF