Standard-format-preference-dataset - a RLHFlow Collection

RLHFlow 's Collections

Decision-Tree Reward Models

RLHFlow MATH Process Reward Model

Standard-format-preference-dataset

Mixture-of-preference-reward-modeling

RM-Bradley-Terry

PM-pair

RLHFLow Reward Models

Standard-format-preference-dataset

updated May 8, 2024

We collect the open-source datasets and process them into the standard format.

RLHFlow/UltraFeedback-preference-standard

Viewer • Updated Apr 27, 2024 • 340k • 91 • 10
RLHFlow/Helpsteer-preference-standard

Viewer • Updated Apr 27, 2024 • 37.1k • 74 • 4
RLHFlow/HH-RLHF-Helpful-standard

Viewer • Updated Apr 27, 2024 • 115k • 171 • 1
RLHFlow/Orca-distibalel-standard

Viewer • Updated Apr 28, 2024 • 6.93k • 50 • 1
RLHFlow/Capybara-distibalel-Filter-standard

Viewer • Updated Apr 28, 2024 • 14.8k • 55
RLHFlow/CodeUltraFeedback-standard

Viewer • Updated Apr 27, 2024 • 50.2k • 102 • 5
RLHFlow/UltraInteract-filtered-standard

Viewer • Updated Apr 28, 2024 • 162k • 62 • 2
RLHFlow/PKU-SafeRLHF-30K-standard

Viewer • Updated Apr 29, 2024 • 26.9k • 470 • 3
RLHFlow/Argilla-Math-DPO-standard

Viewer • Updated Apr 30, 2024 • 2.42k • 83 • 3
RLHFlow/Prometheus2-preference-standard

Viewer • Updated May 5, 2024 • 200k • 60 • 2
RLHFlow/SHP-standard

Viewer • Updated May 9, 2024 • 93.3k • 56
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

Viewer • Updated May 8, 2024 • 42.3k • 159 • 2