-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, torch, sys print(tianshou.__version__, torch.__version__, sys.version, sys.platform)
0.4.0 1.8.0 3.8.8 (default, Feb 24 2021, 21:46:12)
[GCC 7.3.0] linux
Hi there, thanks for this great project.
I want to know whether it's possible that collector can be aware of the environment state. For example, when environment depends on some generator that could run out (e.g., streaming data, one-epoch data, countable gaming scenario), after some episodes, the env cannot be reset anymore. Therefore, we cannot always know how many episodes in total collectors need to collect. In fact, in such cases, when collector should stop fully depends on when env runs out itself.
Actually, in our use case, we are testing our agent on a fixed dataset, where each sample in the dataset is an episode for the agent. And we split the dataset to do parallel sampling in venv. The problem here is, collector cannot accurately stop those envs that need to be stopped. Even with n_episode
, collector could wrongly stop those envs that are still with more samples but keep envs with zero samples.
I can propose a design for this feature, but it involves at least env, venv and collector and will be a pretty large change. Therefore, I think it's worth a discussion to make sure whether people believe it is a valid use case.