--- license: apache-2.0 tags: - moe - merge - epfl-llm/meditron-7b - medalpaca/medalpaca-7b - chaoyi-wu/PMC_LLAMA_7B_10_epoch - allenai/tulu-2-dpo-7b model-index: - name: Medtulu-4x7B results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 28.75 name: normalized accuracy source: url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/Medtulu-4x7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 25.74 name: normalized accuracy source: url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/Medtulu-4x7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 24.41 name: accuracy source: url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/Medtulu-4x7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 47.91 source: url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/Medtulu-4x7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 50.43 name: accuracy source: url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/Medtulu-4x7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 0.0 name: accuracy source: url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/Medtulu-4x7B name: Open LLM Leaderboard --- # Mediquad-tulu-20B Mediquad-tulu-20B is a Mixure of Experts (MoE) made with the following models: * [epfl-llm/meditron-7b](https://huggingface.co./epfl-llm/meditron-7b) * [medalpaca/medalpaca-7b](https://huggingface.co./medalpaca/medalpaca-7b) * [chaoyi-wu/PMC_LLAMA_7B_10_epoch](https://huggingface.co./chaoyi-wu/PMC_LLAMA_7B_10_epoch) * [allenai/tulu-2-dpo-7b](https://huggingface.co./allenai/tulu-2-dpo-7b) ## Evaluations | Benchmark | Mediquad-tulu-20B | meditron-7b | Orca-2-7b | meditron-70b | | --- | --- | --- | --- | --- | | MedMCQA | | | | | | ClosedPubMedQA | | | | | | PubMedQA | | | | | | MedQA | | | | | | MedQA4 | | | | | | MedicationQA | | | | | | MMLU Medical | | | | | | TruthfulQA | | | | | | GSM8K | | | | | | ARC | | | | | | HellaSwag | | | | | | Winogrande | | | | | ## 🧩 Configuration ```yamlbase_model: allenai/tulu-2-dpo-7b gate_mode: hidden dtype: bfloat16 experts: - source_model: epfl-llm/meditron-7b positive_prompts: - "What are the latest guidelines for managing type 2 diabetes?" - "Best practices for post-operative care in cardiac surgery are" negative_prompts: - "What are the environmental impacts of deforestation?" - "The recent advancements in artificial intelligence have led to developments in" - source_model: medalpaca/medalpaca-7b positive_prompts: - "When discussing diabetes management, the key factors to consider are" - "The differential diagnosis for a headache with visual aura could include" negative_prompts: - "Recommend a good recipe for a vegetarian lasagna." - "The fundamental concepts in economics include ideas like supply and demand, which explain" - source_model: chaoyi-wu/PMC_LLAMA_7B_10_epoch positive_prompts: - "How would you explain the importance of hypertension management to a patient?" - "Describe the recovery process after knee replacement surgery in layman's terms." negative_prompts: - "Recommend a good recipe for a vegetarian lasagna." - "The recent advancements in artificial intelligence have led to developments in" - "The fundamental concepts in economics include ideas like supply and demand, which explain" - source_model: allenai/tulu-2-dpo-7b positive_prompts: - "Here is a funny joke for you -" - "When considering the ethical implications of artificial intelligence, one must take into account" - "In strategic planning, a company must analyze its strengths and weaknesses, which involves" - "Understanding consumer behavior in marketing requires considering factors like" - "The debate on climate change solutions hinges on arguments that" negative_prompts: - "In discussing dietary adjustments for managing hypertension, it's crucial to emphasize" - "For early detection of melanoma, dermatologists recommend that patients regularly check their skin for" - "Explaining the importance of vaccination, a healthcare professional should highlight" ``` ## 💻 Usage ```python !pip install -qU transformers bitsandbytes accelerate from transformers import AutoTokenizer import transformers import torch model = "Technoculture/Mediquad-tulu-20B" tokenizer = AutoTokenizer.from_pretrained(model) pipeline = transformers.pipeline( "text-generation", model=model, model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True}, ) messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}] prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_Technoculture__Medtulu-4x7B) | Metric |Value| |---------------------------------|----:| |Avg. |29.54| |AI2 Reasoning Challenge (25-Shot)|28.75| |HellaSwag (10-Shot) |25.74| |MMLU (5-Shot) |24.41| |TruthfulQA (0-shot) |47.91| |Winogrande (5-shot) |50.43| |GSM8k (5-shot) | 0.00|