This is the last major release before version 2.0.0
It solves the regression in data collection performance, introduces several fixes, and importantly, adds support for determinism testing, which is used to ensure that the refactoring in the upcoming 2.0.0 release does not affect any aspect of training or inference
Changes/Improvements
trainer
:- Custom scoring now supported for selecting the best model. #1202
highlevel
:DiscreteSACExperimentBuilder
: Expose methodwith_actor_factory_default
#1248 #1250ActorFactoryDefault
: Fix parameters for hidden sizes and activation not being
passed on in the discrete case (affectswith_actor_factory_default
method of experiment builders)ExperimentConfig
: Do not inherit from other classes, as this breaks automatic handling by
jsonargparse
when the class is used to define interfaces (as in high-level API examples)AutoAlphaFactoryDefault
: Differentiate discrete and continuous action spaces
and allow coefficient to be modified, adding an informative docstring
(previous implementation was reasonable only for continuous action spaces)- Adjust usage in
atari_sac_hl
example accordingly.
- Adjust usage in
NPGAgentFactory
,TRPOAgentFactory
: Fix optimizer instantiation including the actor parameters
(which was misleadingly suggested in the docstring in the respective policy classes; docstrings were fixed),
as the actor parameters are intended to be handled via natural gradients internally
data
:ReplayBuffer
: Fix collection of empty episodes being disallowed- Collection was slow due to
isinstance
checks on Protocols and due to Buffer integrity validation. This was solved
by no longer performingisinstance
on Protocols and by making the integrity validation disabled by default.
- Tests:
- We have introduced extensive determinism tests which allow to validate whether
training processes deterministically compute the same results across different development branches.
This is an important step towards ensuring reproducibility and consistency, which will be
instrumental in supporting Tianshou developers in their work, especially in the context of
algorithm development and evaluation.
- We have introduced extensive determinism tests which allow to validate whether