这是indexloc提供的服务,不要输入任何密码
Skip to content

Tianshou v2 #1259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 257 commits into from
Jul 15, 2025
Merged

Tianshou v2 #1259

merged 257 commits into from
Jul 15, 2025

Conversation

opcode81
Copy link
Collaborator

@opcode81 opcode81 commented May 15, 2025

See change log.

Resolves #1091
Resolves #810
Resolves #959
Resolves #898
Resolves #919
Resolves #913
Resolves #948
Resolves #949
Resolves #1204

opcode81 added 30 commits March 11, 2025 17:50
(undoing previous change before Algorithm was an nn.Module)
Move atari_network and atari_wrapper into the library under tianshou.env.atari
(this is more convenient and cleans up the example structure)
but from a the new abstract base class AbstractActorCriticWithAdvantage
(which A2C also inherits from)
This takes effect for examples using sensai.util.logging
Introduce appropriate base classes
  * ActorCriticOffPolicyAlgorithm
  * ActorDualCriticsOffPolicyAlgorithm
eliminating the inheritance issues that caused violations of the
Liskov substitution principle:
  * DDPG inherits from ActorCriticOffPolicyAlgorithm
  * ActorDualCriticsOffPolicyAlgorithm extends ActorCriticOffPolicyAlgorithm
  * SAC and TD3 now inherit from ActorDualCriticsOffPolicyAlgorithm
    instead of DDPG
…ndling

This commit introduces a better abstraction for alpha parameter
handling in SAC implementations through a dedicated class hierarchy:
- Add abstract Alpha base class with value property and update method
- Add FixedAlpha for constant entropy coefficients
- Add AutoAlpha for automatic entropy tuning

The refactoring simplifies the API by:
- Replacing the complex tuple-based auto-alpha representation with proper classes
- Providing a consistent interface for both fixed and auto-tuned parameters
- Encapsulating alpha-related logic in dedicated classes
- Improving code readability and maintainability

Both implementations (continuous and discrete SAC) now share the same alpha abstraction,
making the codebase more consistent while preserving the original functionality.
… spaces

and allow coefficient to be modified, adding an informative docstring
(previous implementation was reasonable only for continuous action spaces)

Adjust parametrisation to match procedural example in atari_sac_hl
(inherit from ActorCriticOffPolicyAlgorithm instead)
…cess assertion to the test

Improve docstrings of related DQN classes
  * OffpolicyTrainer -> OffPolicyTrainer
  * OnpolicyTrainer -> OnPolicyTrainer
  * The trainer logic and configuration is now properly separated between the three cases of
    on-policy, off-policy and offline learning: The base class is no longer a "God" class
    which does it all; logic and functionality has been moved to the respective subclasses
    (`OnPolicyTrainer`, `OffPolicyTrainer` and `OfflineTrainer`, with `OnlineTrainer`
    being introduced as a base class for the two former specialisations).
  * The trainer configuration objects introduced earlier are now fully specific to the
    respective case, and certral central documentation is provided for each parameter
    (with greatly improved detail)
  * The iterator semantics have been dropped: Method `__next__` has been replaced by
    `execute_epoch`.
  * The interface has been streamlined with improved naming of functions/parameters and
    limiting the public interface to purely the methods and attributes a user can reasonably
    use directly.
  * Issues resolved:
      * Parameter `reset_prior_to_run` of `run` was never respected; changed parametrisation
        accordingly
      * Inconsistent configuration now raises exceptions instead of making assumptions
        about the intention

For further details, see changes committed to CHANGELOG.md.

In the context of v2 refactoring, this commit renders OfflineTrainer functional again
(identified some issues; see TODOs)
opcode81 added 3 commits May 19, 2025 20:51
Conflicts:
	examples/atari/atari_dqn_hl.py
	examples/atari/atari_iqn_hl.py
	examples/atari/atari_ppo_hl.py
	examples/atari/atari_sac_hl.py
	examples/mujoco/mujoco_a2c_hl.py
	examples/mujoco/mujoco_ddpg_hl.py
	examples/mujoco/mujoco_npg_hl.py
	examples/mujoco/mujoco_ppo_hl.py
	examples/mujoco/mujoco_ppo_hl_multi.py
	examples/mujoco/mujoco_redq_hl.py
	examples/mujoco/mujoco_reinforce_hl.py
	examples/mujoco/mujoco_sac_hl.py
	examples/mujoco/mujoco_td3_hl.py
	examples/mujoco/mujoco_trpo_hl.py
	pyproject.toml
	tianshou/env/atari/atari_wrapper.py
	tianshou/highlevel/experiment.py
@opcode81 opcode81 marked this pull request as ready for review July 3, 2025 13:12
@MischaPanch MischaPanch merged commit 58e1632 into master Jul 15, 2025
4 checks passed
@MischaPanch MischaPanch deleted the dev-v2 branch July 15, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment