这是indexloc提供的服务,不要输入任何密码
Skip to content

Don't pass optimizers to policies #959

@MischaPanch

Description

@MischaPanch

This issue grew out of the discussion in

#950 (comment)_

Summary

Currently, all policies take actor/critic/critic2 optimizers that have to be set up correctly by users. This has two downsides

  1. No guarantee that optimizer was in fact instantiated with the correct parameters
  2. Optimizers cannot be reliably copied, something that is needed in case we want to have passing critic2 as optional (default behavior is then copying critic, which covers the vast majority of use cases)

Whether allowing users not to pass critic2/critic2_optim is a good idea is up to discussion (explicitness vs. ease of use), but the first downside is relatively uncontroversial.

There was also a general discussion about keeping interfaces in policies as simple as possible, at the cost of usability. This builds on the idea that the high-level interfaces take care of usability issues.

Personally, I believe the existence of higher-level interfaces does not mean that ease-of-use in Policy based interfaces should not be improved. Algorithm developers (like academic researchers) will likely continue using the Policy classes as main entry point, and easy-to-use interfaces there will simplify their work (and make it more reliable, thus easier to merge in tianshou when done). This is why I am in favor of allowing passing critic2 as None by default, while clearly documenting that the algorithm is making use of two critics.

Important

Addressing this means breaking changes in all Policies, so it will affect all tests and examples

Metadata

Metadata

Assignees

No one assigned

    Labels

    breaking changesChanges in public interfaces. Includes small changes or changes in keysrefactoringNo change to functionalitytentativeUp to discussion, may be dismissed

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions