-
Notifications
You must be signed in to change notification settings - Fork 1.2k
unify single-env and multi-env in collector #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a prerequisite for #134 . |
|
To me it is a great improvement. Reducing the number of branches is not easy but often it makes everything simpler and clearer. And especially this part of the code was unnecessarily complicated and hard to understand for no reason (probably an artefact of the past), since a single env is nothing conceptually different from a vector venv with a single instance. Good you took the time to clean that part of the code ! |
I realized this when I want to support sync simulation, and get support from @Trinkle23897 . It is indeed an artefact of the past because he wants to support partial episodes, but later he realized that it is hardly necessary, so I removed this part of code. |
ready for review now @Trinkle23897 @duburcqa |
The base branch was changed.
According to my experiments: |
We do some experiments on Bipedal-walker and Lunar-lander, the final result seems slightly different with each other. @imoneoi |
The point of this pr is to enforce that there will only be full episodes in the replay buffer. If fully online learning really matters for you, I think you can use a single environment and try to write a collector with simple for-loops. for xxx:
data = []
while len(data) < n:
data.append(env.step())
policy.learn(data) In my opinion, if you need a replay buffer, fully online learning (learning after every several steps) will not matter so much. Current implementation (Tianshou 0.2.5) allows users to perform online learning at the episode level (learning after every several episodes) and I think it is enough. However, feel free to provide feedback and to discuss :) |
Unify the implementation with multi-environments (wrap a single environment in a multi-environment with one envs) to greatly simplify the code. This changed the behavior of single-environment. Prior to this pr, for single environment, collector.collect(n_step=n) will step n steps. After this pr, for single environment, collector.collect(n_step=n) will step m episodes until the steps are greater than n. That is to say, collectors now always collect full episodes.
Unify the implementation with multi-environments (wrap a single environment in a multi-environment with one envs) to greatly simplify the code.
This changed the behavior of single-environment.
Prior to this pr, for single environment,
collector.collect(n_step=n)
will step n steps.After this pr, for single environment,
collector.collect(n_step=n)
will step m episodes until the steps are greater than n.That is to say, collectors now always collect full episodes.