metadata
library_name: stable-baselines3
tags:
- BipedalWalker-v3
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: SAC
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: BipedalWalker-v3
type: BipedalWalker-v3
metrics:
- type: mean_reward
value: '-31.49 +/- 60.03'
name: mean_reward
verified: false
SAC Agent playing BipedalWalker-v3
This is a trained model of a SAC agent playing BipedalWalker-v3 using the stable-baselines3 library.
Usage (with Stable-baselines3)
TODO: Add your code
from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub
...
Well he does ok but still gets stuck on the rocks. Here are my hyperparameters not that they did me much good 😂:
def linear_schedule(initial_value, final_value=0.00001):
def func(progress_remaining):
"""Progress will decrease from 1 (beginning) to 0 (end)"""
return final_value + (initial_value - final_value) * progress_remaining
return func
initial_learning_rate = 7.3e-4
model = SAC(
policy='MlpPolicy',
env=env,
learning_rate=linear_schedule(initial_learning_rate),
buffer_size=1000000,
batch_size=256,
ent_coef=0.005,
gamma=0.99,
tau=0.01,
train_freq=1,
gradient_steps=1,
learning_starts=10000,
policy_kwargs=dict(net_arch=[400, 300]),
verbose=1
)
These are pretty well tuned but SAC leads to too much exploration and the agent is unable to exploit the required actions to complete the course. I suspect TD3 will be more successful so plan to turn back to that