-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Hello, I have a question when using a gym.ObservationWrapper
for training and testing as a mask. There are many actions in my env and some of them are unavailable in some state so I used wrapper as #645 . I created a custom ObservationWrapper
and used like this:
train_envs = [lambda: Wrapper(MyEnv(xxx)) for _ in range(10)]
test_envs = [lambda: Wrapper(MyEnv(xxx)) for _ in range(4)]
# then put them with SubprocVectorEnv into Collector
# and do RainbowDQN as the examples.
The training result really satisfies me but the testing result is always terrible (worse than the random). I've tried to test the training envs by hand-coded function and it also failed.
What's more, to prove there's no serious problem of my env, I just removed the Wrapper
and give some minus reward to the unavailable actions. It appears that it's working for both the training and testing envs but the result is definitily not as good as the Wrapper ones in training.
I wonder if there's something wrong of using Wrapper
. Is there any guidance? Thanks.