Llama 3.1 8B Experimental 1206

Overall Strengths

  1. Logical and Boolean Reasoning โ€“ Excels in tasks requiring clear, rule-based logic and manipulation of true/false statements.
  2. Focused Domain Knowledge โ€“ Strong at certain specialized tasks (sports rules, ruin names, hyperbaton) that blend world knowledge with language comprehension.
  3. Good Instruction Compliance โ€“ High prompt-level and instance-level accuracy (both strict and loose) indicate that it follows user instructions effectively, even in more complex or nuanced prompts.
  4. Reasonable Multi-step Reasoning โ€“ While not the best in every logic category, it still shows solid performance (60%+) on tasks like disambiguation and causal reasoning.
  5. Extended Context Window (138k) โ€“ The large 138k token context allows the model to handle lengthy inputs and maintain coherence across extensive passages or multi-turn conversations. This is especially valuable for tasks like long-document question answering, summarization, or complex scenario analysis where context retention is crucial.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 25.67
IFEval (0-Shot) 69.67
BBH (3-Shot) 30.06
MATH Lvl 5 (4-Shot) 11.10
GPQA (0-shot) 6.60
MuSR (0-shot) 8.50
MMLU-PRO (5-shot) 28.10
Downloads last month
28
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct

Finetuned
(136)
this model
Merges
1 model
Quantizations
1 model

Evaluation results