Yet another 3 fix #160

Trinkle23897 · 2020-07-24T00:02:57Z

I have marked all applicable categories:
- algorithm implementation fix
- documentation modification

DQN learn should keep eps=0
Add a warning of env.seed in VecEnv
fix Potential Bug #162 of multi-dim action

Trinkle23897 · 2020-07-24T00:07:39Z

It's no difference with the previous version because self(batch).act is not used in learn(), since eps only changes act.
But pointing it out is much better.

tianshou/utils/net/common.py

youkaichao · 2020-07-24T08:26:07Z

Reshape is good, and would be better if you can avoid shape [bsz] by all means (reward has this shape) because of the confusing behavior.

duburcqa · 2020-07-24T08:36:55Z

Reshape is good, and would be better if you can avoid shape [bsz] by all means (reward has this shape) because of the confusing behavior.

I don't agree, usually I prefer to use .flatten(1, -1) since it makes it clear that you want to flatten but preserve batch processing.

Trinkle23897 · 2020-07-24T08:38:58Z

Reshape is good, and would be better if you can avoid shape [bsz] by all means (reward has this shape) because of the confusing behavior.

I don't agree, usually I prefer to use .flatten(1, -1) since it makes it clear that you want to flatten but preserve batch processing.

But 1-dim tensor cannot apply flatten(1):

In [6]: b=torch.rand(3)

In [7]: b.flatten(1,-1)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-0fd10ca82bb9> in <module>
----> 1 b.flatten(1,-1)

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

duburcqa · 2020-07-24T08:50:14Z

But 1-dim tensor cannot apply flatten(1):

OK so your example before edit "[bsz, ?] or [bsz, ?, ?]" was wrong. So yes if [bsz] can happen (which I highly doubt, since it means 0-dim input without batch-processing) then reshape must be used.

youkaichao · 2020-07-24T08:54:25Z

if [bsz] can happen (which I highly doubt, since it means 0-dim input without batch-processing)

reward is the case.

duburcqa · 2020-07-24T09:01:14Z

reward is the case.

The reward is used as input of its own neural network ? In which algorithm ?

youkaichao · 2020-07-24T09:03:30Z

The reward is used as input of its own neural network

I'm not talking about using reward as input, though. My suggestion is to reshape reward to be [bsz, 1] in Policy.learn :)

Trinkle23897 · 2020-07-24T09:06:56Z

The reward is used as input of its own neural network

I'm not talking about using reward as input, though. My suggestion is to reshape reward to be [bsz, 1] in Policy.learn :)

I don't agree to reshape this as [bsz, 1] since it had already made a bug previously.

duburcqa · 2020-07-24T09:12:36Z

I'm not talking about using reward as input, though. My suggestion is to reshape reward to be [bsz, 1] in Policy.learn :)

We are talking about this initially.

Trinkle23897 · 2020-07-24T09:15:20Z

Okay. This is because I write a small test with MyTestEnv under the test/base/ folder. I found that the obs in this env cannot be processed through the current network (because its shape [bsz]), so I modify this line of code.
I know it rarely happens :)

1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix thu-ml#162 of multi-dim action

dqn learn should keep eps=0

bfd4a60

add a warning in docs

7d73ca2

Trinkle23897 changed the title ~~DQN learn should keep eps=0~~ DQN learn should keep eps=0 / Add a warning of env.seed in VecEnv Jul 24, 2020

update the warning in docstring

e6719b2

Trinkle23897 changed the title ~~DQN learn should keep eps=0 / Add a warning of env.seed in VecEnv~~ Yet another 3 fix Jul 24, 2020

Trinkle23897 changed the title ~~Yet another 3 fix~~ WIP: Yet another 3 fix Jul 24, 2020

fix #162

025b6f5

Trinkle23897 changed the title ~~WIP: Yet another 3 fix~~ Yet another 3 fix Jul 24, 2020

add test of vecenv seed

ebc6f62

Trinkle23897 changed the title ~~Yet another 3 fix~~ WIP: Yet another 3 fix Jul 24, 2020

fix to reshape+transpose

f7da420

Trinkle23897 changed the title ~~WIP: Yet another 3 fix~~ Yet another 3 fix Jul 24, 2020

change sum to mean in pg

da5f9ec

duburcqa reviewed Jul 24, 2020

View reviewed changes

tianshou/utils/net/common.py Outdated Show resolved Hide resolved

change flatten to reshape

6ca97ab

youkaichao approved these changes Jul 24, 2020

View reviewed changes

duburcqa approved these changes Jul 24, 2020

View reviewed changes

Trinkle23897 merged commit 38a95c1 into thu-ml:dev Jul 24, 2020

Trinkle23897 deleted the fix-dqn branch July 24, 2020 09:38

Trinkle23897 mentioned this pull request Jul 24, 2020

Potential Bug #162

Closed

4 tasks

Trinkle23897 linked an issue Jul 27, 2020 that may be closed by this pull request

Potential Bug #162

Closed

4 tasks

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024

Yet another 3 fix (thu-ml#160)

5db66cc

1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix thu-ml#162 of multi-dim action

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Yet another 3 fix #160

Yet another 3 fix #160

Uh oh!

Trinkle23897 commented Jul 24, 2020 •

edited

Loading

Uh oh!

Trinkle23897 commented Jul 24, 2020 •

edited

Loading

Uh oh!

Uh oh!

youkaichao commented Jul 24, 2020

Uh oh!

duburcqa commented Jul 24, 2020

Uh oh!

Trinkle23897 commented Jul 24, 2020 •

edited

Loading

Uh oh!

duburcqa commented Jul 24, 2020 •

edited

Loading

Uh oh!

youkaichao commented Jul 24, 2020

Uh oh!

duburcqa commented Jul 24, 2020 •

edited

Loading

Uh oh!

youkaichao commented Jul 24, 2020

Uh oh!

Trinkle23897 commented Jul 24, 2020

Uh oh!

duburcqa commented Jul 24, 2020

Uh oh!

Trinkle23897 commented Jul 24, 2020 •

edited

Loading

Uh oh!

Uh oh!

Yet another 3 fix #160

Yet another 3 fix #160

Uh oh!

Conversation

Trinkle23897 commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trinkle23897 commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

youkaichao commented Jul 24, 2020

Uh oh!

duburcqa commented Jul 24, 2020

Uh oh!

Trinkle23897 commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duburcqa commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Jul 24, 2020

Uh oh!

duburcqa commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Jul 24, 2020

Uh oh!

Trinkle23897 commented Jul 24, 2020

Uh oh!

duburcqa commented Jul 24, 2020

Uh oh!

Trinkle23897 commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 commented Jul 24, 2020 •

edited

Loading

Trinkle23897 commented Jul 24, 2020 •

edited

Loading

Trinkle23897 commented Jul 24, 2020 •

edited

Loading

duburcqa commented Jul 24, 2020 •

edited

Loading

duburcqa commented Jul 24, 2020 •

edited

Loading

Trinkle23897 commented Jul 24, 2020 •

edited

Loading