Tianshou v2 #1259

opcode81 · 2025-05-15T11:11:26Z

See change log.

Resolves #1091
Resolves #810
Resolves #959
Resolves #898
Resolves #919
Resolves #913
Resolves #948
Resolves #949
Resolves #1204

(undoing previous change before Algorithm was an nn.Module)

… on for discrete case

Move atari_network and atari_wrapper into the library under tianshou.env.atari (this is more convenient and cleans up the example structure)

but from a the new abstract base class AbstractActorCriticWithAdvantage (which A2C also inherits from)

This takes effect for examples using sensai.util.logging

Introduce appropriate base classes * ActorCriticOffPolicyAlgorithm * ActorDualCriticsOffPolicyAlgorithm eliminating the inheritance issues that caused violations of the Liskov substitution principle: * DDPG inherits from ActorCriticOffPolicyAlgorithm * ActorDualCriticsOffPolicyAlgorithm extends ActorCriticOffPolicyAlgorithm * SAC and TD3 now inherit from ActorDualCriticsOffPolicyAlgorithm instead of DDPG

…ndling This commit introduces a better abstraction for alpha parameter handling in SAC implementations through a dedicated class hierarchy: - Add abstract Alpha base class with value property and update method - Add FixedAlpha for constant entropy coefficients - Add AutoAlpha for automatic entropy tuning The refactoring simplifies the API by: - Replacing the complex tuple-based auto-alpha representation with proper classes - Providing a consistent interface for both fixed and auto-tuned parameters - Encapsulating alpha-related logic in dedicated classes - Improving code readability and maintainability Both implementations (continuous and discrete SAC) now share the same alpha abstraction, making the codebase more consistent while preserving the original functionality.

… spaces and allow coefficient to be modified, adding an informative docstring (previous implementation was reasonable only for continuous action spaces) Adjust parametrisation to match procedural example in atari_sac_hl

(inherit from ActorCriticOffPolicyAlgorithm instead)

…cess assertion to the test Improve docstrings of related DQN classes

* OffpolicyTrainer -> OffPolicyTrainer * OnpolicyTrainer -> OnPolicyTrainer

* The trainer logic and configuration is now properly separated between the three cases of on-policy, off-policy and offline learning: The base class is no longer a "God" class which does it all; logic and functionality has been moved to the respective subclasses (`OnPolicyTrainer`, `OffPolicyTrainer` and `OfflineTrainer`, with `OnlineTrainer` being introduced as a base class for the two former specialisations). * The trainer configuration objects introduced earlier are now fully specific to the respective case, and certral central documentation is provided for each parameter (with greatly improved detail) * The iterator semantics have been dropped: Method `__next__` has been replaced by `execute_epoch`. * The interface has been streamlined with improved naming of functions/parameters and limiting the public interface to purely the methods and attributes a user can reasonably use directly. * Issues resolved: * Parameter `reset_prior_to_run` of `run` was never respected; changed parametrisation accordingly * Inconsistent configuration now raises exceptions instead of making assumptions about the intention For further details, see changes committed to CHANGELOG.md. In the context of v2 refactoring, this commit renders OfflineTrainer functional again

(identified some issues; see TODOs)

Conflicts: tianshou/highlevel/params/alpha.py tianshou/trainer/trainer.py

Conflicts: examples/atari/atari_dqn_hl.py examples/atari/atari_iqn_hl.py examples/atari/atari_ppo_hl.py examples/atari/atari_sac_hl.py examples/mujoco/mujoco_a2c_hl.py examples/mujoco/mujoco_ddpg_hl.py examples/mujoco/mujoco_npg_hl.py examples/mujoco/mujoco_ppo_hl.py examples/mujoco/mujoco_ppo_hl_multi.py examples/mujoco/mujoco_redq_hl.py examples/mujoco/mujoco_reinforce_hl.py examples/mujoco/mujoco_sac_hl.py examples/mujoco/mujoco_td3_hl.py examples/mujoco/mujoco_trpo_hl.py pyproject.toml tianshou/env/atari/atari_wrapper.py tianshou/highlevel/experiment.py

Conflicts: tianshou/data/collector.py tianshou/trainer/base.py

… outdated

opcode81 added 30 commits March 11, 2025 17:50

v2: Restore high-level API support for DDPG and DeepQLearning

8ec2023

v2: Set train mode on Algorithm instead of Policy

e34d37b

(undoing previous change before Algorithm was an nn.Module)

v2: Adjust QRDQN and corresponding test, adapt test_dqn

f32e51b

ActorFactoryDefault: Fix hidden sizes and activation not being passed…

8d4e182

… on for discrete case

v2: Adapt C51, test_c51 and atari_c51

b63dcd5

Move atari_network and atari_wrapper into the library under tianshou.env.atari (this is more convenient and cleans up the example structure)

v2: Adapt A2C and test_a2c_with_il (skipping the il part)

6eb3170

v2: Adapt PPO and test_ppo

df9d4bc

v2: Adapt TD3 and test_td3

60f19e1

v2: Adapt NPG and test_npg

9ef9a84

v2: Fix class hierarchy issues: NPG now no longer inherits from A2C

2555eb8

but from a the new abstract base class AbstractActorCriticWithAdvantage (which A2C also inherits from)

v2: Adapt TRPO and test_trpo

629905b

Add registration of log configuration callback to tianshou.__init__

571ce91

This takes effect for examples using sensai.util.logging

v2: Restore high-level API support for A2C, PPO, TRPO, NPG

85d3a19

Fix method reference (map_action_inverse)

7f79994

v2: Restore high-level API support for TD3

74f5956

v2: Adapt SAC and test_sac_with_il (without the il part)

e36bbd6

v2: Use train mode for full Algorithm in update() [fix]

9b06cd9

v2: Adapt DiscreteSAC and test_discrete_sac

228326d

v2: Adapt REDQ (and test_redq), fixing problematic inheritance from DDPG

dc4dce2

(inherit from ActorCriticOffPolicyAlgorithm instead)

v2: Adapt BranchingDuelingQNetwork (BDQN) and test_bdqn, adding a suc…

1d58866

…cess assertion to the test Improve docstrings of related DQN classes

v2: Adapt FQF and test_fqf

3845905

v2: Adapt IQN and test_iqn

cbc478e

v2: Adapt RainbowDQN and test_rainbow

fb28e47

v2: Adapt ImitationLearning, test_a2c_with_il and test_sac_with_il

2d61f78

v2: Rename Trainer classes, correcting capitalisation

d1f3962

* OffpolicyTrainer -> OffPolicyTrainer * OnpolicyTrainer -> OnPolicyTrainer

v2: Adapt BCQ and test_bcq

31280fb

(identified some issues; see TODOs)

opcode81 added 5 commits May 17, 2025 23:38

v2: Improve neural network class hierarchy

03132cc

v2: Add mock import for cv2 (used in atari_wrapper)

75cfaf0

Merge branch 'dev-v1' into dev-v2

0d8afaf

Conflicts: tianshou/highlevel/params/alpha.py tianshou/trainer/trainer.py

v2: Fix argument references: test_num -> num_test_envs

5209218

v2: Disable determinism tests for CI

f68b6f5

MischaPanch force-pushed the dev-v2 branch from 64c6d79 to 683a17d Compare May 19, 2025 16:41

opcode81 force-pushed the dev-v2 branch from 683a17d to f68b6f5 Compare May 19, 2025 18:45

opcode81 added 3 commits May 19, 2025 20:51

Merge branch 'dev-v1' into dev-v2

c264917

v2: Update parameter names (mainly test_num -> num_test_envs)

556224e

opcode81 force-pushed the dev-v2 branch from b73093f to 556224e Compare May 19, 2025 19:56

opcode81 added 2 commits May 20, 2025 03:13

v2: Fix logic error introduced in commit 0312351

d5960cf

v2: Handle nested algorithms in Algorithm.state_dict

6c3abb0

opcode81 force-pushed the dev-v2 branch from c2ea998 to 6c3abb0 Compare May 20, 2025 10:42

opcode81 added 6 commits May 20, 2025 13:23

v2: Update identifier names (policy -> algorithm)

78d52ed

v2: Rename hl module: policy_wrapper -> algorithm_wrapper

989ecc6

v2: HL: Move optim module to params package

3fb51cd

v2: Add issue references

6ebb6de

Merge branch 'master' into dev-v2

9436396

Conflicts: tianshou/data/collector.py tianshou/trainer/base.py

v2: Set version to 2.0.0b1

87cc6fa

opcode81 marked this pull request as ready for review July 3, 2025 13:12

MischaPanch added 3 commits July 8, 2025 13:36

v2: adjusted dqn.rst to reflect the new API

6bf4f03

v2: Docs. Improved concepts_rst, mentioned that parts of the docs are…

16270cb

… outdated

v2: Docs. Updated readme and concepts_rst to use v2 structure policies

3d5ab5f

MischaPanch force-pushed the dev-v2 branch from d424ba3 to 3d5ab5f Compare July 14, 2025 13:29

opcode81 added 3 commits July 14, 2025 16:00

v2: Changelog: Add information on changes pertaining to lagged networks

4215eaf

v2: Fix typo in docstring

8a49660

v2: Update examples in README

c74cc17

MischaPanch merged commit 58e1632 into master Jul 15, 2025
4 checks passed

MischaPanch deleted the dev-v2 branch July 15, 2025 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tianshou v2 #1259

Tianshou v2 #1259

Uh oh!

opcode81 commented May 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Tianshou v2 #1259

Tianshou v2 #1259

Uh oh!

Conversation

opcode81 commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

opcode81 commented May 15, 2025 •

edited

Loading