--- language: - en - fr - es - pt license: other library_name: transformers tags: - falcon3 license_name: falcon-llm-license license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html model-index: - name: Falcon3-10B-Base results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 36.48 name: strict accuracy source: url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=tiiuae/Falcon3-10B-Base name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 41.38 name: normalized accuracy source: url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=tiiuae/Falcon3-10B-Base name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 24.77 name: exact match source: url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=tiiuae/Falcon3-10B-Base name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 12.75 name: acc_norm source: url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=tiiuae/Falcon3-10B-Base name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 14.17 name: acc_norm source: url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=tiiuae/Falcon3-10B-Base name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 36.0 name: accuracy source: url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=tiiuae/Falcon3-10B-Base name: Open LLM Leaderboard ---
Category | Benchmark | Gemma2-9B | Yi1.5-9B | Mistral-Nemo-Base-2407 (12B) | Falcon3-10B-Base |
---|---|---|---|---|---|
General | MMLU (5-shot) | 70.8 | 69.6 | 68.8 | 73.1 |
MMLU-PRO (5-shot) | 41.4 | 39.3 | 34.7 | 42.5 | |
IFEval | 21.3 | 29.1 | 16.1 | 36.4 | |
Math | GSM8K (5-shot) | 69.1 | 63.8 | 55.3 | 81.4 |
MATH Lvl-5 (4-shot) | 10.5 | 9.2 | 4.9 | 22.9 | |
Reasoning | Arc Challenge (25-shot) | 67.5 | 61.7 | 64.4 | 66.8 |
GPQA (0-shot) | 33.4 | 36.6 | 28.8 | 34.1 | |
MUSR (0-shot) | 45.3 | 43.3 | 39.2 | 44.2 | |
BBH (3-shot) | 54.3 | 51.3 | 50.2 | 59.7 | |
CommonSense Understanding | PIQA (0-shot) | 83.0 | 80.5 | 82.1 | 79.4 |
SciQ (0-shot) | 97.1 | 95.2 | 95.2 | 93.5 | |
Winogrande (0-shot) | 74.2 | 72.7 | 73.2 | 73.6 | |
OpenbookQA (0-shot) | 47.2 | 45.2 | 47.2 | 45.0 |