Search CORE

3 research outputs found

epiTracker: A Framework for Highly Reliable Particle Tracking for the Quantitative Analysis of Fish Movements in Tanks

Author: Felix Loosli
Markus Reischl
Paul M. Scheikl
Ralf Mikut
Roman Bruch
Publication venue: Elsevier BV
Publication date: 01/08/2021
Field of study

DLR-RM/stable-baselines3: Stable-Baselines3 v2.3.0: New defaults hyperparameters for DDPG, TD3 and DQN

Author: Adam Gleave
Alex Pasquali
Anssi
Antonin RAFFIN
Bernhard Raml
Corentin
Costa Huang
Dominic Kerr
Grégoire Passault
Juan Rocamonde
M. Ernestus
Marsel Khisamutdinov
Megan Klaiber
Noah Dormann
Oleksii Kachaiev
Onno Eberhard
Parth Kothari
Patrick Helm
Paul Maria Scheikl
Quentin Gallouédec
Quinn Sinclair
Rohan Tangri
Roland Gavrilescu
Sam Toyer
Sidney Tio
Steven H. Wang
Thomas Simonini
Tobias Rohrer
Tom Dörr
Wilson
Publication venue: Zenodo
Publication date: 31/03/2024
Field of study

<p>SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx</p> <p>To upgrade:</p> <pre><code>pip install stable_baselines3 sb3_contrib --upgrade </code></pre> <p>or simply (rl zoo depends on SB3 and SB3 contrib):</p> <pre><code>pip install rl_zoo3 --upgrade </code></pre> <h2>Breaking Changes:</h2> <ul> <li>The defaults hyperparameters of <code>TD3</code> and <code>DDPG</code> have been changed to be more consistent with <code>SAC</code></li> </ul> <pre><code> # SB3 < 2.3.0 default hyperparameters # model = TD3("MlpPolicy", env, train_freq=(1, "episode"), gradient_steps=-1, batch_size=100) # SB3 >= 2.3.0: model = TD3("MlpPolicy", env, train_freq=1, gradient_steps=1, batch_size=256) </code></pre> <blockquote> <p>[!NOTE] Two inconsistencies remain: the default network architecture for <code>TD3/DDPG</code> is <code>[400, 300]</code> instead of <code>[256, 256]</code> for SAC (for backward compatibility reasons, see <a href="https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-Influence-of-policy-net--Vmlldzo2NDg1Mzk3">report on the influence of the network size </a>) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see <a href="https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-RL-Zoo-v2-3-0a0-vs-SB3-TD3-RL-Zoo-2-2-1---Vmlldzo2MjUyNTQx%3E">W&B report on the influence of the lr </a>)</p> </blockquote> <ul> <li>The default <code>learning_starts</code> parameter of <code>DQN</code> have been changed to be consistent with the other offpolicy algorithms</li> </ul> <pre><code> # SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters # model = DQN("MlpPolicy", env, learning_start=50_000) # SB3 >= 2.3.0: model = DQN("MlpPolicy", env, learning_start=100) </code></pre> <ul> <li>For safety, <code>torch.load()</code> is now called with <code>weights_only=True</code> when loading torch tensors, policy <code>load()</code> still uses <code>weights_only=False</code> as gymnasium imports are required for it to work</li> <li>When using <code>huggingface_sb3</code>, you will now need to set <code>TRUST_REMOTE_CODE=True</code> when downloading models from the hub, as <code>pickle.load</code> is not safe.</li> </ul> <h2>New Features:</h2> <ul> <li>Log success rate <code>rollout/success_rate</code> when available for on policy algorithms (@corentinlger)</li> </ul> <h2>Bug Fixes:</h2> <ul> <li>Fixed <code>monitor_wrapper</code> argument that was not passed to the parent class, and dones argument that wasn't passed to <code>_update_into_buffer</code> (@corentinlger)</li> </ul> <h2><a href="https://github.com/Stable-Baselines-Team/stable-baselines3-contrib">SB3-Contrib</a></h2> <ul> <li>Added <code>rollout_buffer_class</code> and <code>rollout_buffer_kwargs</code> arguments to MaskablePPO</li> <li>Fixed <code>train_freq</code> type annotation for tqc and qrdqn (@Armandpl)</li> <li>Fixed <code>sb3_contrib/common/maskable/*.py</code> type annotations</li> <li>Fixed <code>sb3_contrib/ppo_mask/ppo_mask.py</code> type annotations</li> <li>Fixed <code>sb3_contrib/common/vec_env/async_eval.py</code> type annotations</li> <li>Add some additional notes about <code>MaskablePPO</code> (evaluation and multi-process) (@icheered)</li> </ul> <h2><a href="https://github.com/DLR-RM/rl-baselines3-zoo">RL Zoo</a></h2> <ul> <li>Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC</li> <li>Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)</li> <li>Added test dependencies to <code>setup.py</code> (@power-edge)</li> <li>Simplify dependencies of <code>requirements.txt</code> (remove duplicates from <code>setup.py</code>)</li> </ul> <h2><a href="https://github.com/araffin/sbx">SBX (SB3 + Jax)</a></h2> <ul> <li>Added support for <code>MultiDiscrete</code> and <code>MultiBinary</code> action spaces to PPO</li> <li>Added support for large values for gradient_steps to SAC, TD3, and TQC</li> <li>Fix <code>train()</code> signature and update type hints</li> <li>Fix replay buffer device at load time</li> <li>Added flatten layer</li> <li>Added <code>CrossQ</code></li> </ul> <h2>Others:</h2> <ul> <li>Updated black from v23 to v24</li> <li>Updated ruff to >= v0.3.1</li> <li>Updated env checker for (multi)discrete spaces with non-zero start.</li> </ul> <h2>Documentation:</h2> <ul> <li>Added a paragraph on modifying vectorized environment parameters via setters (@fracapuano)</li> <li>Updated callback code example</li> <li>Updated export to ONNX documentation, it is now much simpler to export SB3 models with newer ONNX Opset!</li> <li>Added video link to "Practical Tips for Reliable Reinforcement Learning" video</li> <li>Added <code>render_mode="human"</code> in the README example (@marekm4)</li> <li>Fixed docstring signature for sum_independent_dims (@stagoverflow)</li> <li>Updated docstring description for <code>log_interval</code> in the base class (@rushitnshah).</li> </ul> <p>Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.2.1...v2.3.0</p&gt

ZENODO

DLR-RM/stable-baselines3: Stable-Baselines3 v2.3.2: Hotfix for PyTorch 1.13

Author: Adam Gleave
Alex Pasquali
Anssi
Antonin RAFFIN
Bernhard Raml
Corentin
Costa Huang
Dominic Kerr
Grégoire Passault
Juan Rocamonde
M. Ernestus
Marsel Khisamutdinov
Megan Klaiber
Noah Dormann
Oleksii Kachaiev
Omar Younis
Parth Kothari
Patrick Helm
Paul Maria Scheikl
Quentin Gallouédec
Quinn Sinclair
Rohan Tangri
Roland Gavrilescu
Sam Toyer
Sidney Tio
Steven H. Wang
Thomas Simonini
Tobias Rohrer
Tom Dörr
Wilson
Publication venue: Zenodo
Publication date: 27/04/2024
Field of study

<h2>Bug fixes</h2> <ul> <li>Reverted <code>torch.load()</code> to be called <code>weights_only=False</code> as it caused loading issue with old version of PyTorch. https://github.com/DLR-RM/stable-baselines3/pull/1913</li> <li>Cast learning_rate to float lambda for pickle safety when doing model.load by @markscsmith in https://github.com/DLR-RM/stable-baselines3/pull/1901</li> </ul> <h2>Documentation</h2> <ul> <li>Fix typo in changelog by @araffin in https://github.com/DLR-RM/stable-baselines3/pull/1882</li> <li>Fixed broken link in ppo.rst by @chaitanyabisht in https://github.com/DLR-RM/stable-baselines3/pull/1884</li> <li>Adding ER-MRL to community project by @corentinlger in https://github.com/DLR-RM/stable-baselines3/pull/1904</li> <li>Fix tensorboad video slow numpy->torch conversion by @NickLucche in https://github.com/DLR-RM/stable-baselines3/pull/1910</li> </ul> <h2>New Contributors</h2> <ul> <li>@chaitanyabisht made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1884</li> <li>@markscsmith made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1901</li> <li>@NickLucche made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1910</li> </ul> <p><strong>Full Changelog</strong>: https://github.com/DLR-RM/stable-baselines3/compare/v2.3.0...v2.3.2</p&gt

ZENODO