rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50 Updated 11 days ago • 5
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50 Updated 11 days ago • 5
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1 Text Generation • Updated 17 days ago • 121
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms Paper • 2303.00694 • Published Mar 1, 2023