REBEL: Reinforcement Learning via Regressing Relative Reward - a Cornell-AGI Collection

Cornell-AGI 's Collections

Regressing the Relative Future: Efficient Policy Optimizatio

REBEL: Reinforcement Learning via Regressing Relative Reward

REBEL: Reinforcement Learning via Regressing Relative Reward

updated Sep 2, 2024

REBEL: Reinforcement Learning via Regressing Relative Rewards

Paper • 2404.16767 • Published Apr 25, 2024 • 2
Cornell-AGI/REBEL-Llama-3-Armo-iter_1

Updated Sep 2, 2024 • 10 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_2

Updated Sep 2, 2024 • 12 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_3

Updated Sep 2, 2024 • 11 • 2
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_1

Viewer • Updated Sep 2, 2024 • 56.1k • 67
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_2

Viewer • Updated Sep 2, 2024 • 55.1k • 78
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_3

Viewer • Updated Sep 2, 2024 • 44.6k • 86 • 1
Cornell-AGI/REBEL-Llama-3

Text Generation • Updated Sep 1, 2024 • 44 • 1
Cornell-AGI/REBEL-Llama-3-epoch_2

Text Generation • Updated Sep 1, 2024 • 34 • 3
Cornell-AGI/REBEL-OpenChat-3.5

Text Generation • Updated Sep 1, 2024 • 25 • 1