这是indexloc提供的服务,不要输入任何密码
Skip to content

v1.2.0

Latest
Compare
Choose a tag to compare
@MischaPanch MischaPanch released this 23 Jun 12:37
· 262 commits to master since this release

This is the last major release before version 2.0.0

It solves the regression in data collection performance, introduces several fixes, and importantly, adds support for determinism testing, which is used to ensure that the refactoring in the upcoming 2.0.0 release does not affect any aspect of training or inference

Changes/Improvements

  • trainer:
    • Custom scoring now supported for selecting the best model. #1202
  • highlevel:
    • DiscreteSACExperimentBuilder: Expose method with_actor_factory_default #1248 #1250
    • ActorFactoryDefault: Fix parameters for hidden sizes and activation not being
      passed on in the discrete case (affects with_actor_factory_default method of experiment builders)
    • ExperimentConfig: Do not inherit from other classes, as this breaks automatic handling by
      jsonargparse when the class is used to define interfaces (as in high-level API examples)
    • AutoAlphaFactoryDefault: Differentiate discrete and continuous action spaces
      and allow coefficient to be modified, adding an informative docstring
      (previous implementation was reasonable only for continuous action spaces)
      • Adjust usage in atari_sac_hl example accordingly.
    • NPGAgentFactory, TRPOAgentFactory: Fix optimizer instantiation including the actor parameters
      (which was misleadingly suggested in the docstring in the respective policy classes; docstrings were fixed),
      as the actor parameters are intended to be handled via natural gradients internally
  • data:
    • ReplayBuffer: Fix collection of empty episodes being disallowed
    • Collection was slow due to isinstance checks on Protocols and due to Buffer integrity validation. This was solved
      by no longer performing isinstance on Protocols and by making the integrity validation disabled by default.
  • Tests:
    • We have introduced extensive determinism tests which allow to validate whether
      training processes deterministically compute the same results across different development branches.
      This is an important step towards ensuring reproducibility and consistency, which will be
      instrumental in supporting Tianshou developers in their work, especially in the context of
      algorithm development and evaluation.