这是indexloc提供的服务,不要输入任何密码
Skip to content

Yet another 3 fix #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 24, 2020
Merged

Yet another 3 fix #160

merged 8 commits into from
Jul 24, 2020

Conversation

Trinkle23897
Copy link
Collaborator

@Trinkle23897 Trinkle23897 commented Jul 24, 2020

  • I have marked all applicable categories:
    • algorithm implementation fix
    • documentation modification
  1. DQN learn should keep eps=0
  2. Add a warning of env.seed in VecEnv
  3. fix Potential Bug  #162 of multi-dim action

@Trinkle23897
Copy link
Collaborator Author

Trinkle23897 commented Jul 24, 2020

It's no difference with the previous version because self(batch).act is not used in learn(), since eps only changes act.
But pointing it out is much better.

@Trinkle23897 Trinkle23897 changed the title DQN learn should keep eps=0 DQN learn should keep eps=0 / Add a warning of env.seed in VecEnv Jul 24, 2020
@Trinkle23897 Trinkle23897 changed the title DQN learn should keep eps=0 / Add a warning of env.seed in VecEnv Yet another 3 fix Jul 24, 2020
@Trinkle23897 Trinkle23897 changed the title Yet another 3 fix WIP: Yet another 3 fix Jul 24, 2020
@Trinkle23897 Trinkle23897 changed the title WIP: Yet another 3 fix Yet another 3 fix Jul 24, 2020
@Trinkle23897 Trinkle23897 changed the title Yet another 3 fix WIP: Yet another 3 fix Jul 24, 2020
@Trinkle23897 Trinkle23897 changed the title WIP: Yet another 3 fix Yet another 3 fix Jul 24, 2020
@youkaichao
Copy link
Collaborator

Reshape is good, and would be better if you can avoid shape [bsz] by all means (reward has this shape) because of the confusing behavior.

@duburcqa
Copy link
Collaborator

Reshape is good, and would be better if you can avoid shape [bsz] by all means (reward has this shape) because of the confusing behavior.

I don't agree, usually I prefer to use .flatten(1, -1) since it makes it clear that you want to flatten but preserve batch processing.

@Trinkle23897
Copy link
Collaborator Author

Trinkle23897 commented Jul 24, 2020

Reshape is good, and would be better if you can avoid shape [bsz] by all means (reward has this shape) because of the confusing behavior.

I don't agree, usually I prefer to use .flatten(1, -1) since it makes it clear that you want to flatten but preserve batch processing.

But 1-dim tensor cannot apply flatten(1):

In [6]: b=torch.rand(3)

In [7]: b.flatten(1,-1)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-0fd10ca82bb9> in <module>
----> 1 b.flatten(1,-1)

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 24, 2020

But 1-dim tensor cannot apply flatten(1):

OK so your example before edit "[bsz, ?] or [bsz, ?, ?]" was wrong. So yes if [bsz] can happen (which I highly doubt, since it means 0-dim input without batch-processing) then reshape must be used.

@youkaichao
Copy link
Collaborator

if [bsz] can happen (which I highly doubt, since it means 0-dim input without batch-processing)

reward is the case.

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 24, 2020

reward is the case.

The reward is used as input of its own neural network ? In which algorithm ?

@youkaichao
Copy link
Collaborator

The reward is used as input of its own neural network

I'm not talking about using reward as input, though. My suggestion is to reshape reward to be [bsz, 1] in Policy.learn :)

@Trinkle23897
Copy link
Collaborator Author

The reward is used as input of its own neural network

I'm not talking about using reward as input, though. My suggestion is to reshape reward to be [bsz, 1] in Policy.learn :)

I don't agree to reshape this as [bsz, 1] since it had already made a bug previously.

@duburcqa
Copy link
Collaborator

I'm not talking about using reward as input, though. My suggestion is to reshape reward to be [bsz, 1] in Policy.learn :)

We are talking about this initially.

@Trinkle23897
Copy link
Collaborator Author

Trinkle23897 commented Jul 24, 2020

Okay. This is because I write a small test with MyTestEnv under the test/base/ folder. I found that the obs in this env cannot be processed through the current network (because its shape [bsz]), so I modify this line of code.
I know it rarely happens :)

@Trinkle23897 Trinkle23897 merged commit 38a95c1 into thu-ml:dev Jul 24, 2020
@Trinkle23897 Trinkle23897 deleted the fix-dqn branch July 24, 2020 09:38
@Trinkle23897 Trinkle23897 mentioned this pull request Jul 24, 2020
4 tasks
@Trinkle23897 Trinkle23897 linked an issue Jul 27, 2020 that may be closed by this pull request
4 tasks
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
1. DQN learn should keep eps=0

2. Add a warning of env.seed in VecEnv

3. fix thu-ml#162 of multi-dim action
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential Bug
3 participants