MattStammers's picture
Update README.md
2835925
metadata
library_name: stable-baselines3
tags:
  - BipedalWalker-v3
  - deep-reinforcement-learning
  - reinforcement-learning
  - stable-baselines3
model-index:
  - name: SAC
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: BipedalWalker-v3
          type: BipedalWalker-v3
        metrics:
          - type: mean_reward
            value: '-31.49 +/- 60.03'
            name: mean_reward
            verified: false

SAC Agent playing BipedalWalker-v3

This is a trained model of a SAC agent playing BipedalWalker-v3 using the stable-baselines3 library.

Usage (with Stable-baselines3)

TODO: Add your code

from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub

...

Well he does ok but still gets stuck on the rocks. Here are my hyperparameters not that they did me much good 😂:

def linear_schedule(initial_value, final_value=0.00001):
    def func(progress_remaining):
        """Progress will decrease from 1 (beginning) to 0 (end)"""
        return final_value + (initial_value - final_value) * progress_remaining
    return func

initial_learning_rate = 7.3e-4

model = SAC(
    policy='MlpPolicy',
    env=env,
    learning_rate=linear_schedule(initial_learning_rate),
    buffer_size=1000000,
    batch_size=256,
    ent_coef=0.005,
    gamma=0.99,
    tau=0.01,
    train_freq=1,
    gradient_steps=1,
    learning_starts=10000,
    policy_kwargs=dict(net_arch=[400, 300]),
    verbose=1
)

These are pretty well tuned but SAC leads to too much exploration and the agent is unable to exploit the required actions to complete the course. I suspect TD3 will be more successful so plan to turn back to that