这是indexloc提供的服务,不要输入任何密码
Skip to content

unify single-env and multi-env in collector #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 23, 2020

Conversation

youkaichao
Copy link
Collaborator

@youkaichao youkaichao commented Jul 23, 2020

Unify the implementation with multi-environments (wrap a single environment in a multi-environment with one envs) to greatly simplify the code.

This changed the behavior of single-environment.
Prior to this pr, for single environment, collector.collect(n_step=n) will step n steps.
After this pr, for single environment, collector.collect(n_step=n) will step m episodes until the steps are greater than n.

That is to say, collectors now always collect full episodes.

@youkaichao
Copy link
Collaborator Author

This is a prerequisite for #134 .

@youkaichao youkaichao changed the title Improve Collector WIP: Improve Collector Jul 23, 2020
@Trinkle23897 Trinkle23897 changed the title WIP: Improve Collector WIP: unify single-env and multi-env in collector Jul 23, 2020
Trinkle23897
Trinkle23897 previously approved these changes Jul 23, 2020
@youkaichao
Copy link
Collaborator Author

Collector.collect should only reflect the stat of current call. If users want to track the stat (such as moving average) of Collector.collect, they can achieve it with minimal efforts, but this should not be done by Tianshou anyway. So I commited d3e23ac .

@duburcqa
Copy link
Collaborator

To me it is a great improvement. Reducing the number of branches is not easy but often it makes everything simpler and clearer. And especially this part of the code was unnecessarily complicated and hard to understand for no reason (probably an artefact of the past), since a single env is nothing conceptually different from a vector venv with a single instance. Good you took the time to clean that part of the code !

@youkaichao
Copy link
Collaborator Author

And especially this part of the code was unnecessarily complicated and hard to understand for no reason (probably an artefact of the past), since a single env is nothing conceptually different from a vector venv with a single instance.

I realized this when I want to support sync simulation, and get support from @Trinkle23897 . It is indeed an artefact of the past because he wants to support partial episodes, but later he realized that it is hardly necessary, so I removed this part of code.

@youkaichao youkaichao changed the title WIP: unify single-env and multi-env in collector unify single-env and multi-env in collector Jul 23, 2020
@youkaichao youkaichao requested a review from duburcqa July 23, 2020 07:37
@youkaichao
Copy link
Collaborator Author

youkaichao commented Jul 23, 2020

ready for review now @Trinkle23897 @duburcqa

duburcqa
duburcqa previously approved these changes Jul 23, 2020
Trinkle23897
Trinkle23897 previously approved these changes Jul 23, 2020
@Trinkle23897 Trinkle23897 changed the base branch from master to dev July 23, 2020 08:15
@Trinkle23897 Trinkle23897 dismissed stale reviews from duburcqa and themself July 23, 2020 08:15

The base branch was changed.

@Trinkle23897 Trinkle23897 requested a review from duburcqa July 23, 2020 08:16
@Trinkle23897 Trinkle23897 merged commit bfeffe1 into thu-ml:dev Jul 23, 2020
@youkaichao youkaichao deleted the collector branch July 23, 2020 08:41
@jiang-yuan
Copy link

According to my experiments:
Even though I change the step_number within an episode (e.g. There is a 30 steps episode, within which I update the net every 5, 10, or 15 steps), the result may be different.
I do not understand. What is the reason for 'he realized that it is hardly necessary'?

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jul 24, 2020

We do some experiments on Bipedal-walker and Lunar-lander, the final result seems slightly different with each other. @imoneoi

@youkaichao
Copy link
Collaborator Author

Even though I change the step_number within an episode (e.g. There is a 30 steps episode, within which I update the net every 5, 10, or 15 steps), the result may be different.

The point of this pr is to enforce that there will only be full episodes in the replay buffer. If fully online learning really matters for you, I think you can use a single environment and try to write a collector with simple for-loops.

for xxx:
    data = []
    while len(data) < n:
        data.append(env.step())
    policy.learn(data)

In my opinion, if you need a replay buffer, fully online learning (learning after every several steps) will not matter so much. Current implementation (Tianshou 0.2.5) allows users to perform online learning at the episode level (learning after every several episodes) and I think it is enough.

However, feel free to provide feedback and to discuss :)

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
Unify the implementation with multi-environments (wrap a single environment in a multi-environment with one envs) to greatly simplify the code.

This changed the behavior of single-environment.
Prior to this pr, for single environment, collector.collect(n_step=n) will step n steps.
After this pr, for single environment, collector.collect(n_step=n) will step m episodes until the steps are greater than n.

That is to say, collectors now always collect full episodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants