Using wrapper or mask makes a great training but a terrible testing

Hello, I have a question when using a `gym.ObservationWrapper` for training and testing as a mask. There are many actions in my env and some of them are unavailable in some state so I used wrapper as #645 . I created a custom `ObservationWrapper` and used like this:
```python
train_envs = [lambda: Wrapper(MyEnv(xxx)) for _ in range(10)]
test_envs = [lambda: Wrapper(MyEnv(xxx)) for _ in range(4)]
# then put them with SubprocVectorEnv into Collector
# and do RainbowDQN as the examples.
```
The training result really satisfies me but the testing result is always terrible (worse than the random). I've tried to test the training envs by hand-coded function and it also failed.

What's more, to prove there's no serious problem of my env, I just removed the `Wrapper` and give some minus reward to the unavailable actions. It appears that it's working for both the training and testing envs but the result is definitily not as good as the Wrapper ones in training. 

I wonder if there's something wrong of using `Wrapper`. Is there any guidance? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using wrapper or mask makes a great training but a terrible testing #708

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using wrapper or mask makes a great training but a terrible testing #708

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions