-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Tianshou v2 #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Tianshou v2 #1259
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(undoing previous change before Algorithm was an nn.Module)
… on for discrete case
Move atari_network and atari_wrapper into the library under tianshou.env.atari (this is more convenient and cleans up the example structure)
but from a the new abstract base class AbstractActorCriticWithAdvantage (which A2C also inherits from)
This takes effect for examples using sensai.util.logging
Introduce appropriate base classes * ActorCriticOffPolicyAlgorithm * ActorDualCriticsOffPolicyAlgorithm eliminating the inheritance issues that caused violations of the Liskov substitution principle: * DDPG inherits from ActorCriticOffPolicyAlgorithm * ActorDualCriticsOffPolicyAlgorithm extends ActorCriticOffPolicyAlgorithm * SAC and TD3 now inherit from ActorDualCriticsOffPolicyAlgorithm instead of DDPG
…ndling This commit introduces a better abstraction for alpha parameter handling in SAC implementations through a dedicated class hierarchy: - Add abstract Alpha base class with value property and update method - Add FixedAlpha for constant entropy coefficients - Add AutoAlpha for automatic entropy tuning The refactoring simplifies the API by: - Replacing the complex tuple-based auto-alpha representation with proper classes - Providing a consistent interface for both fixed and auto-tuned parameters - Encapsulating alpha-related logic in dedicated classes - Improving code readability and maintainability Both implementations (continuous and discrete SAC) now share the same alpha abstraction, making the codebase more consistent while preserving the original functionality.
… spaces and allow coefficient to be modified, adding an informative docstring (previous implementation was reasonable only for continuous action spaces) Adjust parametrisation to match procedural example in atari_sac_hl
(inherit from ActorCriticOffPolicyAlgorithm instead)
…cess assertion to the test Improve docstrings of related DQN classes
* OffpolicyTrainer -> OffPolicyTrainer * OnpolicyTrainer -> OnPolicyTrainer
* The trainer logic and configuration is now properly separated between the three cases of on-policy, off-policy and offline learning: The base class is no longer a "God" class which does it all; logic and functionality has been moved to the respective subclasses (`OnPolicyTrainer`, `OffPolicyTrainer` and `OfflineTrainer`, with `OnlineTrainer` being introduced as a base class for the two former specialisations). * The trainer configuration objects introduced earlier are now fully specific to the respective case, and certral central documentation is provided for each parameter (with greatly improved detail) * The iterator semantics have been dropped: Method `__next__` has been replaced by `execute_epoch`. * The interface has been streamlined with improved naming of functions/parameters and limiting the public interface to purely the methods and attributes a user can reasonably use directly. * Issues resolved: * Parameter `reset_prior_to_run` of `run` was never respected; changed parametrisation accordingly * Inconsistent configuration now raises exceptions instead of making assumptions about the intention For further details, see changes committed to CHANGELOG.md. In the context of v2 refactoring, this commit renders OfflineTrainer functional again
(identified some issues; see TODOs)
Conflicts: tianshou/highlevel/params/alpha.py tianshou/trainer/trainer.py
Conflicts: examples/atari/atari_dqn_hl.py examples/atari/atari_iqn_hl.py examples/atari/atari_ppo_hl.py examples/atari/atari_sac_hl.py examples/mujoco/mujoco_a2c_hl.py examples/mujoco/mujoco_ddpg_hl.py examples/mujoco/mujoco_npg_hl.py examples/mujoco/mujoco_ppo_hl.py examples/mujoco/mujoco_ppo_hl_multi.py examples/mujoco/mujoco_redq_hl.py examples/mujoco/mujoco_reinforce_hl.py examples/mujoco/mujoco_sac_hl.py examples/mujoco/mujoco_td3_hl.py examples/mujoco/mujoco_trpo_hl.py pyproject.toml tianshou/env/atari/atari_wrapper.py tianshou/highlevel/experiment.py
Conflicts: tianshou/data/collector.py tianshou/trainer/base.py
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See change log.
Resolves #1091
Resolves #810
Resolves #959
Resolves #898
Resolves #919
Resolves #913
Resolves #948
Resolves #949
Resolves #1204