3 research outputs found

    DLR-RM/stable-baselines3: Stable-Baselines3 v2.3.0: New defaults hyperparameters for DDPG, TD3 and DQN

    No full text
    <p>SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx</p> <p>To upgrade:</p> <pre><code>pip install stable_baselines3 sb3_contrib --upgrade </code></pre> <p>or simply (rl zoo depends on SB3 and SB3 contrib):</p> <pre><code>pip install rl_zoo3 --upgrade </code></pre> <h2>Breaking Changes:</h2> <ul> <li>The defaults hyperparameters of <code>TD3</code> and <code>DDPG</code> have been changed to be more consistent with <code>SAC</code></li> </ul> <pre><code> # SB3 < 2.3.0 default hyperparameters # model = TD3("MlpPolicy", env, train_freq=(1, "episode"), gradient_steps=-1, batch_size=100) # SB3 >= 2.3.0: model = TD3("MlpPolicy", env, train_freq=1, gradient_steps=1, batch_size=256) </code></pre> <blockquote> <p>[!NOTE] Two inconsistencies remain: the default network architecture for <code>TD3/DDPG</code> is <code>[400, 300]</code> instead of <code>[256, 256]</code> for SAC (for backward compatibility reasons, see <a href="https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-Influence-of-policy-net--Vmlldzo2NDg1Mzk3">report on the influence of the network size </a>) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see <a href="https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-RL-Zoo-v2-3-0a0-vs-SB3-TD3-RL-Zoo-2-2-1---Vmlldzo2MjUyNTQx%3E">W&B report on the influence of the lr </a>)</p> </blockquote> <ul> <li>The default <code>learning_starts</code> parameter of <code>DQN</code> have been changed to be consistent with the other offpolicy algorithms</li> </ul> <pre><code> # SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters # model = DQN("MlpPolicy", env, learning_start=50_000) # SB3 >= 2.3.0: model = DQN("MlpPolicy", env, learning_start=100) </code></pre> <ul> <li>For safety, <code>torch.load()</code> is now called with <code>weights_only=True</code> when loading torch tensors, policy <code>load()</code> still uses <code>weights_only=False</code> as gymnasium imports are required for it to work</li> <li>When using <code>huggingface_sb3</code>, you will now need to set <code>TRUST_REMOTE_CODE=True</code> when downloading models from the hub, as <code>pickle.load</code> is not safe.</li> </ul> <h2>New Features:</h2> <ul> <li>Log success rate <code>rollout/success_rate</code> when available for on policy algorithms (@corentinlger)</li> </ul> <h2>Bug Fixes:</h2> <ul> <li>Fixed <code>monitor_wrapper</code> argument that was not passed to the parent class, and dones argument that wasn't passed to <code>_update_into_buffer</code> (@corentinlger)</li> </ul> <h2><a href="https://github.com/Stable-Baselines-Team/stable-baselines3-contrib">SB3-Contrib</a></h2> <ul> <li>Added <code>rollout_buffer_class</code> and <code>rollout_buffer_kwargs</code> arguments to MaskablePPO</li> <li>Fixed <code>train_freq</code> type annotation for tqc and qrdqn (@Armandpl)</li> <li>Fixed <code>sb3_contrib/common/maskable/*.py</code> type annotations</li> <li>Fixed <code>sb3_contrib/ppo_mask/ppo_mask.py</code> type annotations</li> <li>Fixed <code>sb3_contrib/common/vec_env/async_eval.py</code> type annotations</li> <li>Add some additional notes about <code>MaskablePPO</code> (evaluation and multi-process) (@icheered)</li> </ul> <h2><a href="https://github.com/DLR-RM/rl-baselines3-zoo">RL Zoo</a></h2> <ul> <li>Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC</li> <li>Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)</li> <li>Added test dependencies to <code>setup.py</code> (@power-edge)</li> <li>Simplify dependencies of <code>requirements.txt</code> (remove duplicates from <code>setup.py</code>)</li> </ul> <h2><a href="https://github.com/araffin/sbx">SBX (SB3 + Jax)</a></h2> <ul> <li>Added support for <code>MultiDiscrete</code> and <code>MultiBinary</code> action spaces to PPO</li> <li>Added support for large values for gradient_steps to SAC, TD3, and TQC</li> <li>Fix <code>train()</code> signature and update type hints</li> <li>Fix replay buffer device at load time</li> <li>Added flatten layer</li> <li>Added <code>CrossQ</code></li> </ul> <h2>Others:</h2> <ul> <li>Updated black from v23 to v24</li> <li>Updated ruff to >= v0.3.1</li> <li>Updated env checker for (multi)discrete spaces with non-zero start.</li> </ul> <h2>Documentation:</h2> <ul> <li>Added a paragraph on modifying vectorized environment parameters via setters (@fracapuano)</li> <li>Updated callback code example</li> <li>Updated export to ONNX documentation, it is now much simpler to export SB3 models with newer ONNX Opset!</li> <li>Added video link to "Practical Tips for Reliable Reinforcement Learning" video</li> <li>Added <code>render_mode="human"</code> in the README example (@marekm4)</li> <li>Fixed docstring signature for sum_independent_dims (@stagoverflow)</li> <li>Updated docstring description for <code>log_interval</code> in the base class (@rushitnshah).</li> </ul> <p>Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.2.1...v2.3.0</p&gt

    DLR-RM/stable-baselines3: Stable-Baselines3 v2.3.2: Hotfix for PyTorch 1.13

    No full text
    <h2>Bug fixes</h2> <ul> <li>Reverted <code>torch.load()</code> to be called <code>weights_only=False</code> as it caused loading issue with old version of PyTorch. https://github.com/DLR-RM/stable-baselines3/pull/1913</li> <li>Cast learning_rate to float lambda for pickle safety when doing model.load by @markscsmith in https://github.com/DLR-RM/stable-baselines3/pull/1901</li> </ul> <h2>Documentation</h2> <ul> <li>Fix typo in changelog by @araffin in https://github.com/DLR-RM/stable-baselines3/pull/1882</li> <li>Fixed broken link in ppo.rst by @chaitanyabisht in https://github.com/DLR-RM/stable-baselines3/pull/1884</li> <li>Adding ER-MRL to community project by @corentinlger in https://github.com/DLR-RM/stable-baselines3/pull/1904</li> <li>Fix tensorboad video slow numpy->torch conversion by @NickLucche in https://github.com/DLR-RM/stable-baselines3/pull/1910</li> </ul> <h2>New Contributors</h2> <ul> <li>@chaitanyabisht made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1884</li> <li>@markscsmith made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1901</li> <li>@NickLucche made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1910</li> </ul> <p><strong>Full Changelog</strong>: https://github.com/DLR-RM/stable-baselines3/compare/v2.3.0...v2.3.2</p&gt
    corecore