Tags: araffin/sbx
Tags
KL Adaptive LR for PPO and LR schedule for SAC/TQC (#72) * Only check for terminated episodes * Start adding ortho init * Add SimbaPolicy for PPO * Try adding ortho init to SAC * Enable lr schedule for PPO * Allow to pass lr, prepare for adaptive lr * Implement adaptive lr * Add small test * Refactor adaptive lr * Add adaptive lr for SAC * Fix qf_learning_rate * Revert "Fix qf_learning_rate" This reverts commit ab33983. * Revert "Add adaptive lr for SAC" This reverts commit 5832702. * Revert kl div for SAC changes * Revert dist.mode() in two lines * Cleanup code * Add support for Gaussian actor for SAC * Enable Gaussian actor for TQC * Log std too * Avoid NaN in kl div approx * Allow to use layer_norm in actor * Reformat * Allow max grad norm for TQC and fix optimizer class * Comment out max grad norm * Update to schedule classes * Add lr schedule support for TQC * Revert experimental changes and add support for lr schedule for SAC * Add test for adaptive kl div, remove squash output param
Update PPO to support `net_arch`, and additional fixes (#65) * Add support for flexible arch in PPO * Fix ent_coeff logging for TQC * Fix name order * Fix ent_coeff logging for SAC * Hotfix for PPO, do not squash output at test time * Fix typo * Fix typo in common policy * Try Gaussian dist for TQC * Revert "Try Gaussian dist for TQC" This reverts commit 6eeaf23. * Fix CrossQ ent_coef logging * Log PPO std when possible * Fix for CrossQ
Add SimBa Policy: Simplicity Bias for Scaling Up Parameters in DRL (#59) * Start testing simba * Quick try with CrossQ * Add actor for CrossQ * Add simba net for TQC * Remove unused param * Add parameter resets for TQC * Fix reset * Add missing param * Update documentation * Add parameter resets * Reformat pyproject.toml * Refactor: share actor between SAC and TQC * Add run tests for simba * Upgrade to python 3.9 (#64) * Fix mypy error, update version
Optimize the log of the entropy coeff instead of the entropy coeff (#56) * optimize the log of the entropy coeff instead of the entropy coeff * Update log ent coef for SAC and derivates * Reformat yaml * Use uv for faster downloads * Remove TODO * Remove redundant call --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Add CrossQ (#28) * Added support for large values for gradient_steps to SAC, TD3, and TQC by replacing the unrolled loop with jax.lax.fori_loop * Add comments * Hotfix for train signature * Fixed start index for dynamic_slice_in_dim * Rename policy delay * Fix type annotation * Add CrossQ POC * Remove old annotations * Add actor BN * Concatenate obs/next obs, first working example * Deactivate batchnorm for actor * Fix off-by-one and improve type annotation * Fix typo * Update type annotation * Update off-by one * Implemented CrossQ * Added CrossQ to README * clean up and comments * refactored and added comments * Update doc * Cleanup CrossQ and BatchRenorm * Update tests * Fix for new tfp version * Clean-up: Removed unused variables and fixed typo * Cleaner variable names for BatchReNorm Co-authored-by: Jan Schneider <33448112+jan1854@users.noreply.github.com> * Allow to change the number of warmup steps * Update SB3 dependency * Deprecate DroQ class * [ci skip] Update comments --------- Co-authored-by: Jan Schneider <Jan.Schneider1997@gmail.com> Co-authored-by: Daniel Palenicek <daniel.palenicek@tu-darmstadt.de> Co-authored-by: Jan Schneider <Jan.Schneider@tuebingen.mpg.de> Co-authored-by: Jan Schneider <33448112+jan1854@users.noreply.github.com>
Support for MultiDiscrete and MultiBinary action spaces in PPO (#30) * Added support for MultiDiscrete action space to PPO * Added support for MultiBinary action spaces as discrete action spaces with two choices * Added tests for PPO with MultiDiscrete and MultiBinary action spaces * Moved the padding comment * Fixed type errors * Replaced | by Union in type hint to support python < 3.10 * Update ruff * Rename variables * Add more comments and pre-compute variables * Check that actions are not outside action space * [ci skip] Update version --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
Added support for large values for gradient_steps to SAC, TD3, and TQC ( #21) * Added support for large values for gradient_steps to SAC, TD3, and TQC by replacing the unrolled loop with jax.lax.fori_loop * Add comments * Hotfix for train signature * Fixed start index for dynamic_slice_in_dim * Rename policy delay * Fix type annotation * Remove old annotations * Fix off-by-one and improve type annotation * Fix typo * [ci skip] Update README --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
PreviousNext