The usage of `action_spaces = gym.spaces.MultiDiscrete` in DQN policy, in dqn.py

Hello!

I'm using `gym.spaces.MultiDiscrete` as my `action_spaces` in my custom env which has one agent and needs a multiple dimension action. A list of chooses YES/NO is as my input action, e.g. `[0 1 0 1 1 0]`.

However, the random action in the pre-collect stage is fixed as a single action, e.g. `[6 6 6 6 6 6 6 6]`, with the same shape as the neural network output using `argmax()` in `dqn.py`:
```python
        if hasattr(obs, "mask"):
            # some of actions are masked, they cannot be selected
            q_: np.ndarray = to_numpy(q)
            q_[~obs.mask] = -np.inf
            act = q_.argmax(axis=1)
        # add eps to act in training or testing phase
        if not self.updating and not np.isclose(self.eps, 0.0):
            for i in range(len(q)):
                if np.random.rand() < self.eps:
                    q_ = np.random.rand(*q[i].shape)
                    if hasattr(obs, "mask"):
                        q_[~obs.mask[i]] = -np.inf
                    act[i] = q_.argmax()
```
Hence I replace `q_.argmax()` with `to_numpy(torch.from_numpy(q_).max(dim=0)[1])` 
and `q_.argmax(axis=1)` with `act = to_numpy(torch.from_numpy(q_).max(dim=1)[1])`.

And there is my issue: Is that **allowed** in tianshou?

In Chinese: 
同学你好！我在使用`gym.spaces.MultiDiscrete`的时候发现tianshou的DQN policy会生成不对应的action，请问我的上述更改是否会对产生整个policy产生其他未知的影响？感谢同学~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The usage of `action_spaces = gym.spaces.MultiDiscrete` in DQN policy, in dqn.py #236

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The usage of action_spaces = gym.spaces.MultiDiscrete in DQN policy, in dqn.py #236

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The usage of `action_spaces = gym.spaces.MultiDiscrete` in DQN policy, in dqn.py #236