Note: This is an experimental model.
ReasoningCoreβ3B-RE1-V2
ReasoningCoreβ3B is a multilingual, reasoningβenhanced large language model developed by EpitemeAI. Pretrained on vast amounts of publicly available data and instructionβtuned to excel at nuanced reasoning, dialogue management, retrieval, and summarization tasks, it often outperforms many current open source and proprietary conversational models on a range of industry benchmarks. Fine tuned with reasoning dataset.
We used GRPO technique:
To provide a comprehensive overview of Group Relative Policy Optimization (GRPO), a post-training technique for Large Language Models (LLMs), and its application in the DeepSeek-R1 model.
- Post-training with GRPO involves using a reinforcement learning (RL) technique to optimize the Large Language Model (LLM) after it has been initially trained.
- GRPO specifically focuses on scaling test-time compute for extended reasoning tasks, making it suitable for tackling complex problems like mathematical problem-solving.
- Unlike earlier methods that utilized search-heuristic approaches, GRPO relies exclusively on RL for post-training, thereby enhancing the model's ability to handle nuanced tasks.
- The GRPO technique is available through the TRL library, and the Hugging Face Science team is working to reproduce the full DeepSeek-R1 training process, which can be explored in their
Model Information
- Model Developer: EpitemeAI
- Model Architecture:
ReasoningCoreβ3B is an autoβregressive language model built on an optimized transformer architecture. It incorporates specialized reasoning pathways and has been fineβtuned using Group Robust Preference Optimization(GRPO), and both supervised learning and reinforcement learning with human feedback (RLHF) to align with human expectations for clarity, accuracy, and safety in complex tasks.
Training Data | Params | Input Modalities | Output Modalities | Context Length | GQA | Shared Embeddings | Token Count | Knowledge Cutoff | |
---|---|---|---|---|---|---|---|---|---|
ReasoningCoreβ3B (text only) | A new mix of publicly available online data. | 3B | Multilingual Text | Multilingual Text and code | 128k | Yes | Yes | Up to 9T tokens | December 2023 |
- Supported Languages:
Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. While the pretraining included a broader range of languages, additional languages can be fineβtuned in compliance with the community license and acceptable use policies. - Model Release Date: Sept 25, 2024
- Status: Static model trained on an offline dataset. Future iterations may further enhance its reasoning capabilities and safety features.
- License: Use is governed by the Llama 3.2 Community License (a custom, commercial license agreement).
- Feedback: For questions or comments, please refer to the GitHub repository README or follow the linked instructions.
Intended Use
Use Cases
- Conversational AI: Assistantβlike interactions.
- Knowledge Retrieval & Summarization: Dynamic extraction and condensation of information.
- Mobile AIβPowered Writing Assistants: Query reformulation and natural language generation.
- General Natural Language Generation: Any application that benefits from advanced reasoning abilities.
Out of Scope
- Deployments that violate applicable laws or trade compliance regulations.
- Use cases that conflict with the Acceptable Use Policy or licensing terms.
- Deployments in languages not explicitly supported (unless additional safety and performance validations are performed).
How to Use
ReasoningCoreβ3B can be integrated using popular machine learning frameworks. Two primary methods are provided:
Use system prompt
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
Use with Transformers
Ensure you have transformers version 4.43.0 or later installed:
pip install --upgrade transformers
import torch
from transformers import pipeline
model_id = "EpistemeAI/ReasoningCore-3B-R01"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
print(pipe("The secret to effective reasoning is"))
For Mathematical problems
Please use "Please reason step by step, and put your final answer within \boxed{}" in system prompt
Responsibility & Safety
Responsible Deployment
Approach:
- ReasoningCoreβ3B is a foundational technology that includes builtβin safety guardrails. Developers are encouraged to integrate additional safeguards tailored to their specific applications.
SystemβLevel Safety:
- The model is designed to be deployed as part of a broader system that implements safety measures (e.g., Prompt Guard, Code Shield) to ensure outputs remain safe even under adversarial conditions.
Safety FineβTuning & Data Strategy
Objectives:
- Provide a reliable tool for building secure and helpful reasoning systems.
- Mitigate adversarial misuse through advanced data selection and response optimization techniques.
Methodology:
- Incorporate adversarial prompts during training to refine model refusals and response tone.
- Combine humanβcurated data with synthetic data.
- Utilize iterative fineβtuning using supervised learning, rejection sampling, and preference optimization.
Evaluations and Red Teaming
Scaled Evaluations:
- Dedicated adversarial datasets were used to rigorously test the modelβs robustness. Developers should perform contextβspecific evaluations.
Red Teaming:
- Experts in cybersecurity, adversarial machine learning, and responsible AI conducted recurring red team exercises to identify vulnerabilities and improve both performance and safety.
Critical Risk Mitigations
CBRNE:
The model has been evaluated to ensure it does not enhance capabilities for harmful activities involving chemical, biological, radiological, nuclear, or explosive materials.Child Safety:
Expert assessments were conducted to evaluate and mitigate potential child safety risks.Cyber Attacks:
Measures were taken to ensure the model cannot autonomously facilitate cyberβoffensive operations.
Ethical Considerations and Limitations
Core Values:
- ReasoningCoreβ3B is built on the values of openness, inclusivity, and helpfulness. It is designed to respect user autonomy and foster free thought and expression while mitigating potential harm.
Testing and Limitations:
- Despite extensive testing across diverse scenarios, the model may occasionally produce inaccurate, biased, or objectionable outputs. Developers must perform additional safety testing and integrate further safeguards as needed.
Resources for Safe Deployment, with Meta Safety Deployment:
Conclusion
ReasoningCoreβ3B represents a significant advancement in multilingual, reasoningβenhanced language models. Optimized for tasks requiring deep reasoning, contextual understanding, and safe, helpful interactions, it offers a powerful tool for both commercial and research applications. We invite developers and researchers to explore its capabilities and contribute to building secure, innovative AI systems.
For further details, questions, or feedback, please email [email protected]
Uploaded model
- Developed by: EpistemeAI
- License: llama 3.2 communmity license
- Finetuned from model : EpistemeAI/ReasoningCore-3B-0
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 23.57 |
IFEval (0-Shot) | 73.93 |
BBH (3-Shot) | 22.47 |
MATH Lvl 5 (4-Shot) | 15.63 |
GPQA (0-shot) | 3.13 |
MuSR (0-shot) | 2.02 |
MMLU-PRO (5-shot) | 24.23 |
- Downloads last month
- 246
Model tree for EpistemeAI/ReasoningCore-3B-RE1-V2
Base model
meta-llama/Llama-3.2-3B-InstructSpaces using EpistemeAI/ReasoningCore-3B-RE1-V2 2
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard73.930
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard22.470
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard15.630
- acc_norm on GPQA (0-shot)Open LLM Leaderboard3.130
- acc_norm on MuSR (0-shot)Open LLM Leaderboard2.020
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard24.230