-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
Hello!
I'm using gym.spaces.MultiDiscrete
as my action_spaces
in my custom env which has one agent and needs a multiple dimension action. A list of chooses YES/NO is as my input action, e.g. [0 1 0 1 1 0]
.
However, the random action in the pre-collect stage is fixed as a single action, e.g. [6 6 6 6 6 6 6 6]
, with the same shape as the neural network output using argmax()
in dqn.py
:
if hasattr(obs, "mask"):
# some of actions are masked, they cannot be selected
q_: np.ndarray = to_numpy(q)
q_[~obs.mask] = -np.inf
act = q_.argmax(axis=1)
# add eps to act in training or testing phase
if not self.updating and not np.isclose(self.eps, 0.0):
for i in range(len(q)):
if np.random.rand() < self.eps:
q_ = np.random.rand(*q[i].shape)
if hasattr(obs, "mask"):
q_[~obs.mask[i]] = -np.inf
act[i] = q_.argmax()
Hence I replace q_.argmax()
with to_numpy(torch.from_numpy(q_).max(dim=0)[1])
and q_.argmax(axis=1)
with act = to_numpy(torch.from_numpy(q_).max(dim=1)[1])
.
And there is my issue: Is that allowed in tianshou?
In Chinese:
同学你好!我在使用gym.spaces.MultiDiscrete
的时候发现tianshou的DQN policy会生成不对应的action,请问我的上述更改是否会对产生整个policy产生其他未知的影响?感谢同学~
Chrisa142857
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested