--- language: en model-index: - name: vwxyzjn/rm_zephyr_new results: - task: type: preference_evaluation dataset: name: reward-bench type: allenai/reward-bench metrics: - type: accuracy value: 0.5343383584589615 - task: type: preference_evaluation dataset: name: Chat type: Chat metrics: - type: accuracy value: 0.8128491620111732 - task: type: preference_evaluation dataset: name: Chat Hard type: Chat_Hard metrics: - type: accuracy value: 0.5263157894736842 - task: type: preference_evaluation dataset: name: Safety type: Safety metrics: - type: accuracy value: 0.4851351351351351 - task: type: preference_evaluation dataset: name: Reasoning type: Reasoning metrics: - type: accuracy value: 0.3930266819446718 - task: type: preference_evaluation dataset: name: alpacaeval-easy type: alpacaeval-easy metrics: - type: accuracy value: 0.88 - task: type: preference_evaluation dataset: name: alpacaeval-hard type: alpacaeval-hard metrics: - type: accuracy value: 0.8947368421052632 - task: type: preference_evaluation dataset: name: alpacaeval-length type: alpacaeval-length metrics: - type: accuracy value: 0.6842105263157895 - task: type: preference_evaluation dataset: name: donotanswer type: donotanswer metrics: - type: accuracy value: 0.34558823529411764 - task: type: preference_evaluation dataset: name: hep-cpp type: hep-cpp metrics: - type: accuracy value: 0.6646341463414634 - task: type: preference_evaluation dataset: name: hep-go type: hep-go metrics: - type: accuracy value: 0.6951219512195121 - task: type: preference_evaluation dataset: name: hep-java type: hep-java metrics: - type: accuracy value: 0.6707317073170732 - task: type: preference_evaluation dataset: name: hep-js type: hep-js metrics: - type: accuracy value: 0.676829268292683 - task: type: preference_evaluation dataset: name: hep-python type: hep-python metrics: - type: accuracy value: 0.6829268292682927 - task: type: preference_evaluation dataset: name: hep-rust type: hep-rust metrics: - type: accuracy value: 0.5609756097560976 - task: type: preference_evaluation dataset: name: llmbar-adver-GPTInst type: llmbar-adver-GPTInst metrics: - type: accuracy value: 0.31521739130434784 - task: type: preference_evaluation dataset: name: llmbar-adver-GPTOut type: llmbar-adver-GPTOut metrics: - type: accuracy value: 0.5531914893617021 - task: type: preference_evaluation dataset: name: llmbar-adver-manual type: llmbar-adver-manual metrics: - type: accuracy value: 0.43478260869565216 - task: type: preference_evaluation dataset: name: llmbar-adver-neighbor type: llmbar-adver-neighbor metrics: - type: accuracy value: 0.6044776119402985 - task: type: preference_evaluation dataset: name: llmbar-natural type: llmbar-natural metrics: - type: accuracy value: 0.64 - task: type: preference_evaluation dataset: name: math-prm type: math-prm metrics: - type: accuracy value: 0.12751677852348994 - task: type: preference_evaluation dataset: name: mt-bench-easy type: mt-bench-easy metrics: - type: accuracy value: 0.7857142857142857 - task: type: preference_evaluation dataset: name: mt-bench-hard type: mt-bench-hard metrics: - type: accuracy value: 0.5405405405405406 - task: type: preference_evaluation dataset: name: mt-bench-med type: mt-bench-med metrics: - type: accuracy value: 0.775 - task: type: preference_evaluation dataset: name: refusals-dangerous type: refusals-dangerous metrics: - type: accuracy value: 0.18 - task: type: preference_evaluation dataset: name: refusals-offensive type: refusals-offensive metrics: - type: accuracy value: 0.58 - task: type: preference_evaluation dataset: name: xstest-should-refuse type: xstest-should-refuse metrics: - type: accuracy value: 0.461038961038961 - task: type: preference_evaluation dataset: name: xstest-should-respond type: xstest-should-respond metrics: - type: accuracy value: 0.66 --- # Model Card for vwxyzjn/rm_zephyr_new ## Model Details ### Model Description - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** en - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]