+
Skip to content

Tags: araffin/sbx

Tags

v0.21.0

Toggle v0.21.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
KL Adaptive LR for PPO and LR schedule for SAC/TQC (#72)

* Only check for terminated episodes

* Start adding ortho init

* Add SimbaPolicy for PPO

* Try adding ortho init to SAC

* Enable lr schedule for PPO

* Allow to pass lr, prepare for adaptive lr

* Implement adaptive lr

* Add small test

* Refactor adaptive lr

* Add adaptive lr for SAC

* Fix qf_learning_rate

* Revert "Fix qf_learning_rate"

This reverts commit ab33983.

* Revert "Add adaptive lr for SAC"

This reverts commit 5832702.

* Revert kl div for SAC changes

* Revert dist.mode() in two lines

* Cleanup code

* Add support for Gaussian actor for SAC

* Enable Gaussian actor for TQC

* Log std too

* Avoid NaN in kl div approx

* Allow to use layer_norm in actor

* Reformat

* Allow max grad norm for TQC and fix optimizer class

* Comment out max grad norm

* Update to schedule classes

* Add lr schedule support for TQC

* Revert experimental changes and add support for lr schedule for SAC

* Add test for adaptive kl div, remove squash output param

v0.20.0

Toggle v0.20.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update PPO to support `net_arch`, and additional fixes (#65)

* Add support for flexible arch in PPO

* Fix ent_coeff logging for TQC

* Fix name order

* Fix ent_coeff logging for SAC

* Hotfix for PPO, do not squash output at test time

* Fix typo

* Fix typo in common policy

* Try Gaussian dist for TQC

* Revert "Try Gaussian dist for TQC"

This reverts commit 6eeaf23.

* Fix CrossQ ent_coef logging

* Log PPO std when possible

* Fix for CrossQ

v0.19.0

Toggle v0.19.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add SimBa Policy: Simplicity Bias for Scaling Up Parameters in DRL (#59)

* Start testing simba

* Quick try with CrossQ

* Add actor for CrossQ

* Add simba net for TQC

* Remove unused param

* Add parameter resets for TQC

* Fix reset

* Add missing param

* Update documentation

* Add parameter resets

* Reformat pyproject.toml

* Refactor: share actor between SAC and TQC

* Add run tests for simba

* Upgrade to python 3.9 (#64)

* Fix mypy error, update version

v0.18.0

Toggle v0.18.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Optimize the log of the entropy coeff instead of the entropy coeff (#56)

* optimize the log of the entropy coeff instead of the entropy coeff

* Update log ent coef for SAC and derivates

* Reformat yaml

* Use uv for faster downloads

* Remove TODO

* Remove redundant call

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

v0.17.0

Toggle v0.17.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add CNN support for DQN (#49)

* Add CNN support for DQN

* Update version and deps

* Fix CNN, channel last, padding and reshape

v0.15.0

Toggle v0.15.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Hotfix - Return the new updated key in function _train (#46)

* return the new updated key in _train

* Add regression test and update version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>

v0.13.0

Toggle v0.13.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add CrossQ (#28)

* Added support for large values for gradient_steps to SAC, TD3, and TQC by replacing the unrolled loop with jax.lax.fori_loop

* Add comments

* Hotfix for train signature

* Fixed start index for dynamic_slice_in_dim

* Rename policy delay

* Fix type annotation

* Add CrossQ POC

* Remove old annotations

* Add actor BN

* Concatenate obs/next obs, first working example

* Deactivate batchnorm for actor

* Fix off-by-one and improve type annotation

* Fix typo

* Update type annotation

* Update off-by one

* Implemented CrossQ

* Added CrossQ to README

* clean up and comments

* refactored and added comments

* Update doc

* Cleanup CrossQ and BatchRenorm

* Update tests

* Fix for new tfp version

* Clean-up: Removed unused variables and fixed typo

* Cleaner variable names for BatchReNorm

Co-authored-by: Jan Schneider <33448112+jan1854@users.noreply.github.com>

* Allow to change the number of warmup steps

* Update SB3 dependency

* Deprecate DroQ class

* [ci skip] Update comments

---------

Co-authored-by: Jan Schneider <Jan.Schneider1997@gmail.com>
Co-authored-by: Daniel Palenicek <daniel.palenicek@tu-darmstadt.de>
Co-authored-by: Jan Schneider <Jan.Schneider@tuebingen.mpg.de>
Co-authored-by: Jan Schneider <33448112+jan1854@users.noreply.github.com>

v0.12.0

Toggle v0.12.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Support for MultiDiscrete and MultiBinary action spaces in PPO (#30)

* Added support for MultiDiscrete action space to PPO

* Added support for MultiBinary action spaces as discrete action spaces with two choices

* Added tests for PPO with MultiDiscrete and MultiBinary action spaces

* Moved the padding comment

* Fixed type errors

* Replaced | by Union in type hint to support python < 3.10

* Update ruff

* Rename variables

* Add more comments and pre-compute variables

* Check that actions are not outside action space

* [ci skip] Update version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>

v0.11.0

Toggle v0.11.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Added support for large values for gradient_steps to SAC, TD3, and TQC (

#21)

* Added support for large values for gradient_steps to SAC, TD3, and TQC by replacing the unrolled loop with jax.lax.fori_loop

* Add comments

* Hotfix for train signature

* Fixed start index for dynamic_slice_in_dim

* Rename policy delay

* Fix type annotation

* Remove old annotations

* Fix off-by-one and improve type annotation

* Fix typo

* [ci skip] Update README

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>

v0.10.0

Toggle v0.10.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Fix train signature and update type hints (#24)

* Hotfix for train signature

* Fix deprecated type hints

* Fix mypy

* Update optax dep for python 3.8
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载