Tags · araffin/sbx

v0.21.0

KL Adaptive LR for PPO and LR schedule for SAC/TQC (#72)

* Only check for terminated episodes

* Start adding ortho init

* Add SimbaPolicy for PPO

* Try adding ortho init to SAC

* Enable lr schedule for PPO

* Allow to pass lr, prepare for adaptive lr

* Implement adaptive lr

* Add small test

* Refactor adaptive lr

* Add adaptive lr for SAC

* Fix qf_learning_rate

* Revert "Fix qf_learning_rate"

This reverts commit ab33983.

* Revert "Add adaptive lr for SAC"

This reverts commit 5832702.

* Revert kl div for SAC changes

* Revert dist.mode() in two lines

* Cleanup code

* Add support for Gaussian actor for SAC

* Enable Gaussian actor for TQC

* Log std too

* Avoid NaN in kl div approx

* Allow to use layer_norm in actor

* Reformat

* Allow max grad norm for TQC and fix optimizer class

* Comment out max grad norm

* Update to schedule classes

* Add lr schedule support for TQC

* Revert experimental changes and add support for lr schedule for SAC

* Add test for adaptive kl div, remove squash output param

May 19, 2025
849e908
zip
tar.gz
Notes

v0.20.0

Update PPO to support `net_arch`, and additional fixes (#65)

* Add support for flexible arch in PPO

* Fix ent_coeff logging for TQC

* Fix name order

* Fix ent_coeff logging for SAC

* Hotfix for PPO, do not squash output at test time

* Fix typo

* Fix typo in common policy

* Try Gaussian dist for TQC

* Revert "Try Gaussian dist for TQC"

This reverts commit 6eeaf23.

* Fix CrossQ ent_coef logging

* Log PPO std when possible

* Fix for CrossQ

Feb 14, 2025
8238fcc
zip
tar.gz
Notes

v0.19.0

Add SimBa Policy: Simplicity Bias for Scaling Up Parameters in DRL (#59)

* Start testing simba

* Quick try with CrossQ

* Add actor for CrossQ

* Add simba net for TQC

* Remove unused param

* Add parameter resets for TQC

* Fix reset

* Add missing param

* Update documentation

* Add parameter resets

* Reformat pyproject.toml

* Refactor: share actor between SAC and TQC

* Add run tests for simba

* Upgrade to python 3.9 (#64)

* Fix mypy error, update version

Jan 14, 2025
9cad1d0
zip
tar.gz
Notes

v0.18.0

Optimize the log of the entropy coeff instead of the entropy coeff (#56)

* optimize the log of the entropy coeff instead of the entropy coeff

* Update log ent coef for SAC and derivates

* Reformat yaml

* Use uv for faster downloads

* Remove TODO

* Remove redundant call

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

Nov 1, 2024
1c79684
zip
tar.gz
Notes

v0.17.0

Add CNN support for DQN (#49)

* Add CNN support for DQN

* Update version and deps

* Fix CNN, channel last, padding and reshape

Jul 11, 2024
19c85a1
zip
tar.gz
Notes

v0.15.0

Hotfix - Return the new updated key in function _train (#46)

* return the new updated key in _train

* Add regression test and update version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>

Apr 12, 2024
42caa65
zip
tar.gz
Notes

v0.13.0

Add CrossQ (#28)

* Added support for large values for gradient_steps to SAC, TD3, and TQC by replacing the unrolled loop with jax.lax.fori_loop

* Add comments

* Hotfix for train signature

* Fixed start index for dynamic_slice_in_dim

* Rename policy delay

* Fix type annotation

* Add CrossQ POC

* Remove old annotations

* Add actor BN

* Concatenate obs/next obs, first working example

* Deactivate batchnorm for actor

* Fix off-by-one and improve type annotation

* Fix typo

* Update type annotation

* Update off-by one

* Implemented CrossQ

* Added CrossQ to README

* clean up and comments

* refactored and added comments

* Update doc

* Cleanup CrossQ and BatchRenorm

* Update tests

* Fix for new tfp version

* Clean-up: Removed unused variables and fixed typo

* Cleaner variable names for BatchReNorm

Co-authored-by: Jan Schneider <33448112+jan1854@users.noreply.github.com>

* Allow to change the number of warmup steps

* Update SB3 dependency

* Deprecate DroQ class

* [ci skip] Update comments

---------

Co-authored-by: Jan Schneider <Jan.Schneider1997@gmail.com>
Co-authored-by: Daniel Palenicek <daniel.palenicek@tu-darmstadt.de>
Co-authored-by: Jan Schneider <Jan.Schneider@tuebingen.mpg.de>
Co-authored-by: Jan Schneider <33448112+jan1854@users.noreply.github.com>

Apr 3, 2024
c8db73f
zip
tar.gz
Notes

v0.12.0

Support for MultiDiscrete and MultiBinary action spaces in PPO (#30)

* Added support for MultiDiscrete action space to PPO

* Added support for MultiBinary action spaces as discrete action spaces with two choices

* Added tests for PPO with MultiDiscrete and MultiBinary action spaces

* Moved the padding comment

* Fixed type errors

* Replaced | by Union in type hint to support python < 3.10

* Update ruff

* Rename variables

* Add more comments and pre-compute variables

* Check that actions are not outside action space

* [ci skip] Update version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>

Feb 28, 2024
db6120b
zip
tar.gz
Notes

v0.11.0

Added support for large values for gradient_steps to SAC, TD3, and TQC (

#21)

* Added support for large values for gradient_steps to SAC, TD3, and TQC by replacing the unrolled loop with jax.lax.fori_loop

* Add comments

* Hotfix for train signature

* Fixed start index for dynamic_slice_in_dim

* Rename policy delay

* Fix type annotation

* Remove old annotations

* Fix off-by-one and improve type annotation

* Fix typo

* [ci skip] Update README

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>

Feb 9, 2024
e564074
zip
tar.gz
Notes

v0.10.0

Fix train signature and update type hints (#24)

* Hotfix for train signature

* Fix deprecated type hints

* Fix mypy

* Update optax dep for python 3.8

Jan 16, 2024
37ed771
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.21.0

v0.20.0

v0.19.0

v0.18.0

v0.17.0

v0.15.0

v0.13.0

v0.12.0

v0.11.0

v0.10.0

Tags: araffin/sbx