[Here](https://github.com/thu-ml/tianshou/blob/18d2f25efff81561f3b47682227bc80d3787889d/tianshou/trainer/onpolicy.py#L103), onpolicy trainer relies on the value `rew` - which is the mean reward from the collector: ```python best_reward, best_reward_std = test_result["rew"], test_result["rew_std"] ``` but this value is only computed by the logger [here](https://github.com/thu-ml/tianshou/blob/ebaca6f8da91e18e0192184c24f5d13e3a5d0092/tianshou/utils/log_tools.py#L131): ```python collect_result["rew"] = collect_result["rews"].mean() ``` So it seems to me that a logger, if it is switched to another loger, which does not compute the mean - will break the whole thing.