-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
Here, onpolicy trainer relies on the value rew
- which is the mean reward from the collector:
best_reward, best_reward_std = test_result["rew"], test_result["rew_std"]
but this value is only computed by the logger here:
collect_result["rew"] = collect_result["rews"].mean()
So it seems to me that a logger, if it is switched to another loger, which does not compute the mean - will break the whole thing.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested