Llama 3.1 8B Experimental 1206

Logical and Boolean Reasoning – Excels in tasks requiring clear, rule-based logic and manipulation of true/false statements.
Focused Domain Knowledge – Strong at certain specialized tasks (sports rules, ruin names, hyperbaton) that blend world knowledge with language comprehension.
Good Instruction Compliance – High prompt-level and instance-level accuracy (both strict and loose) indicate that it follows user instructions effectively, even in more complex or nuanced prompts.
Reasonable Multi-step Reasoning – While not the best in every logic category, it still shows solid performance (60%+) on tasks like disambiguation and causal reasoning.
Extended Context Window (138k) – The large 138k token context allows the model to handle lengthy inputs and maintain coherence across extensive passages or multi-turn conversations. This is especially valuable for tasks like long-document question answering, summarization, or complex scenario analysis where context retention is crucial.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here