这是indexloc提供的服务,不要输入任何密码
Skip to content

The usage of action_spaces = gym.spaces.MultiDiscrete in DQN policy, in dqn.py #236

@Chrisa142857

Description

@Chrisa142857

Hello!

I'm using gym.spaces.MultiDiscrete as my action_spaces in my custom env which has one agent and needs a multiple dimension action. A list of chooses YES/NO is as my input action, e.g. [0 1 0 1 1 0].

However, the random action in the pre-collect stage is fixed as a single action, e.g. [6 6 6 6 6 6 6 6], with the same shape as the neural network output using argmax() in dqn.py:

        if hasattr(obs, "mask"):
            # some of actions are masked, they cannot be selected
            q_: np.ndarray = to_numpy(q)
            q_[~obs.mask] = -np.inf
            act = q_.argmax(axis=1)
        # add eps to act in training or testing phase
        if not self.updating and not np.isclose(self.eps, 0.0):
            for i in range(len(q)):
                if np.random.rand() < self.eps:
                    q_ = np.random.rand(*q[i].shape)
                    if hasattr(obs, "mask"):
                        q_[~obs.mask[i]] = -np.inf
                    act[i] = q_.argmax()

Hence I replace q_.argmax() with to_numpy(torch.from_numpy(q_).max(dim=0)[1])
and q_.argmax(axis=1) with act = to_numpy(torch.from_numpy(q_).max(dim=1)[1]).

And there is my issue: Is that allowed in tianshou?

In Chinese:
同学你好!我在使用gym.spaces.MultiDiscrete的时候发现tianshou的DQN policy会生成不对应的action,请问我的上述更改是否会对产生整个policy产生其他未知的影响?感谢同学~

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions