-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
This issue grew out of the discussion in
Summary
Currently, all policies take actor/critic/critic2 optimizers that have to be set up correctly by users. This has two downsides
- No guarantee that optimizer was in fact instantiated with the correct parameters
- Optimizers cannot be reliably copied, something that is needed in case we want to have passing
critic2
as optional (default behavior is then copyingcritic
, which covers the vast majority of use cases)
Whether allowing users not to pass critic2/critic2_optim
is a good idea is up to discussion (explicitness vs. ease of use), but the first downside is relatively uncontroversial.
There was also a general discussion about keeping interfaces in policies as simple as possible, at the cost of usability. This builds on the idea that the high-level interfaces take care of usability issues.
Personally, I believe the existence of higher-level interfaces does not mean that ease-of-use in Policy based interfaces should not be improved. Algorithm developers (like academic researchers) will likely continue using the Policy classes as main entry point, and easy-to-use interfaces there will simplify their work (and make it more reliable, thus easier to merge in tianshou when done). This is why I am in favor of allowing passing critic2
as None by default, while clearly documenting that the algorithm is making use of two critics.
Important
Addressing this means breaking changes in all Policies, so it will affect all tests and examples
Metadata
Metadata
Assignees
Labels
Type
Projects
Status