Medmerge-tulu-70b

Medmerge-tulu-70b is a merge of the following models:

Open LLM Leaderboard

image/png

Model Name ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
tulu-2-dpo-70b 72.1 88.99 69.84 65.78 83.27 62.62
Medmerge-tulu-70b 67.81 87.46 70.1 47.89 83.43 56.56

Performance

Clinical Camel demonstrates competitive performance on medical benchmarks.

Table: Five-Shot Performance of Clinical Camel-70B (C70), GPT3.5, GPT4, and Med-PaLM 2 on Various Medical Datasets

Dataset Medmerge-tulu-70b ClinicalCamel-70B GPT3.5 GPT4 Med-PaLM 2
MMLU Anatomy 66.6 65.2 60.7 80.0 77.8
MMLU Clinical Knowledge 72.0 72.8 68.7 86.4 88.3
MMLU College Biology 84.7 81.2 72.9 93.8 94.4
MMLU College Medicine 64.2 68.2 63.6 76.3 80.9
MMLU Medical Genetics 76.0 69.0 68.0 92.0 90.0
MMLU Professional Medicine 75.7 75.0 69.8 93.8 95.2
MedMCQA 54.2 51.0 72.4 71.3
MedQA (USMLE) 60.7 53.6 81.4 79.7
PubMedQA 77.9 60.2 74.4 79.2
USMLE Sample Exam 64.3 58.5 86.6 -

🧩 Configuration

models:
  - model: NousResearch/Llama-2-70b-hf
    # no parameters necessary for base model
  - model: wanglab/ClinicalCamel-70B
    parameters:
      weight: 0.08
      density: 0.45
  - model: epfl-llm/meditron-70b
    parameters:
      weight: 0.08
      density: 0.45
  - model: allenai/tulu-2-dpo-70b
    parameters:
      weight: 0.08
      density: 0.45
merge_method: dare_ties
base_model: NousResearch/Llama-2-70b-hf
parameters:
  int8_mask: true
dtype: bfloat16

πŸ’» Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Technoculture/Medmerge-tulu-70b"
messages = [{"role": "user", "content": "I am feeling sleepy these days"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
70
Safetensors
Model size
69B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Technoculture/Medmerge-tulu-70b

Quantizations
2 models

Collection including Technoculture/Medmerge-tulu-70b