3 research outputs found
epiTracker: A Framework for Highly Reliable Particle Tracking for the Quantitative Analysis of Fish Movements in Tanks
DLR-RM/stable-baselines3: Stable-Baselines3 v2.3.0: New defaults hyperparameters for DDPG, TD3 and DQN
<p>SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx</p>
<p>To upgrade:</p>
<pre><code>pip install stable_baselines3 sb3_contrib --upgrade
</code></pre>
<p>or simply (rl zoo depends on SB3 and SB3 contrib):</p>
<pre><code>pip install rl_zoo3 --upgrade
</code></pre>
<h2>Breaking Changes:</h2>
<ul>
<li>The defaults hyperparameters of <code>TD3</code> and <code>DDPG</code> have been changed to be more consistent with <code>SAC</code></li>
</ul>
<pre><code>
# SB3 < 2.3.0 default hyperparameters
# model = TD3("MlpPolicy", env, train_freq=(1, "episode"), gradient_steps=-1, batch_size=100)
# SB3 >= 2.3.0:
model = TD3("MlpPolicy", env, train_freq=1, gradient_steps=1, batch_size=256)
</code></pre>
<blockquote>
<p>[!NOTE]
Two inconsistencies remain: the default network architecture for <code>TD3/DDPG</code> is <code>[400, 300]</code> instead of <code>[256, 256]</code> for SAC (for backward compatibility reasons, see <a href="https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-Influence-of-policy-net--Vmlldzo2NDg1Mzk3">report on the influence of the network size </a>) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see <a href="https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-RL-Zoo-v2-3-0a0-vs-SB3-TD3-RL-Zoo-2-2-1---Vmlldzo2MjUyNTQx%3E">W&B report on the influence of the lr </a>)</p>
</blockquote>
<ul>
<li>The default <code>learning_starts</code> parameter of <code>DQN</code> have been changed to be consistent with the other offpolicy algorithms</li>
</ul>
<pre><code>
# SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
# model = DQN("MlpPolicy", env, learning_start=50_000)
# SB3 >= 2.3.0:
model = DQN("MlpPolicy", env, learning_start=100)
</code></pre>
<ul>
<li>For safety, <code>torch.load()</code> is now called with <code>weights_only=True</code> when loading torch tensors,
policy <code>load()</code> still uses <code>weights_only=False</code> as gymnasium imports are required for it to work</li>
<li>When using <code>huggingface_sb3</code>, you will now need to set <code>TRUST_REMOTE_CODE=True</code> when downloading models from the hub, as <code>pickle.load</code> is not safe.</li>
</ul>
<h2>New Features:</h2>
<ul>
<li>Log success rate <code>rollout/success_rate</code> when available for on policy algorithms (@corentinlger)</li>
</ul>
<h2>Bug Fixes:</h2>
<ul>
<li>Fixed <code>monitor_wrapper</code> argument that was not passed to the parent class, and dones argument that wasn't passed to <code>_update_into_buffer</code> (@corentinlger)</li>
</ul>
<h2><a href="https://github.com/Stable-Baselines-Team/stable-baselines3-contrib">SB3-Contrib</a></h2>
<ul>
<li>Added <code>rollout_buffer_class</code> and <code>rollout_buffer_kwargs</code> arguments to MaskablePPO</li>
<li>Fixed <code>train_freq</code> type annotation for tqc and qrdqn (@Armandpl)</li>
<li>Fixed <code>sb3_contrib/common/maskable/*.py</code> type annotations</li>
<li>Fixed <code>sb3_contrib/ppo_mask/ppo_mask.py</code> type annotations</li>
<li>Fixed <code>sb3_contrib/common/vec_env/async_eval.py</code> type annotations</li>
<li>Add some additional notes about <code>MaskablePPO</code> (evaluation and multi-process) (@icheered)</li>
</ul>
<h2><a href="https://github.com/DLR-RM/rl-baselines3-zoo">RL Zoo</a></h2>
<ul>
<li>Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC</li>
<li>Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)</li>
<li>Added test dependencies to <code>setup.py</code> (@power-edge)</li>
<li>Simplify dependencies of <code>requirements.txt</code> (remove duplicates from <code>setup.py</code>)</li>
</ul>
<h2><a href="https://github.com/araffin/sbx">SBX (SB3 + Jax)</a></h2>
<ul>
<li>Added support for <code>MultiDiscrete</code> and <code>MultiBinary</code> action spaces to PPO</li>
<li>Added support for large values for gradient_steps to SAC, TD3, and TQC</li>
<li>Fix <code>train()</code> signature and update type hints</li>
<li>Fix replay buffer device at load time</li>
<li>Added flatten layer</li>
<li>Added <code>CrossQ</code></li>
</ul>
<h2>Others:</h2>
<ul>
<li>Updated black from v23 to v24</li>
<li>Updated ruff to >= v0.3.1</li>
<li>Updated env checker for (multi)discrete spaces with non-zero start.</li>
</ul>
<h2>Documentation:</h2>
<ul>
<li>Added a paragraph on modifying vectorized environment parameters via setters (@fracapuano)</li>
<li>Updated callback code example</li>
<li>Updated export to ONNX documentation, it is now much simpler to export SB3 models with newer ONNX Opset!</li>
<li>Added video link to "Practical Tips for Reliable Reinforcement Learning" video</li>
<li>Added <code>render_mode="human"</code> in the README example (@marekm4)</li>
<li>Fixed docstring signature for sum_independent_dims (@stagoverflow)</li>
<li>Updated docstring description for <code>log_interval</code> in the base class (@rushitnshah).</li>
</ul>
<p>Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.2.1...v2.3.0</p>
DLR-RM/stable-baselines3: Stable-Baselines3 v2.3.2: Hotfix for PyTorch 1.13
<h2>Bug fixes</h2>
<ul>
<li>Reverted <code>torch.load()</code> to be called <code>weights_only=False</code> as it caused loading issue with old version of PyTorch. https://github.com/DLR-RM/stable-baselines3/pull/1913</li>
<li>Cast learning_rate to float lambda for pickle safety when doing model.load by @markscsmith in https://github.com/DLR-RM/stable-baselines3/pull/1901</li>
</ul>
<h2>Documentation</h2>
<ul>
<li>Fix typo in changelog by @araffin in https://github.com/DLR-RM/stable-baselines3/pull/1882</li>
<li>Fixed broken link in ppo.rst by @chaitanyabisht in https://github.com/DLR-RM/stable-baselines3/pull/1884</li>
<li>Adding ER-MRL to community project by @corentinlger in https://github.com/DLR-RM/stable-baselines3/pull/1904</li>
<li>Fix tensorboad video slow numpy->torch conversion by @NickLucche in https://github.com/DLR-RM/stable-baselines3/pull/1910</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li>@chaitanyabisht made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1884</li>
<li>@markscsmith made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1901</li>
<li>@NickLucche made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1910</li>
</ul>
<p><strong>Full Changelog</strong>: https://github.com/DLR-RM/stable-baselines3/compare/v2.3.0...v2.3.2</p>
