这是indexloc提供的服务,不要输入任何密码
Skip to content

Using wrapper or mask makes a great training but a terrible testing #708

@lsylusiyao

Description

@lsylusiyao

Hello, I have a question when using a gym.ObservationWrapper for training and testing as a mask. There are many actions in my env and some of them are unavailable in some state so I used wrapper as #645 . I created a custom ObservationWrapper and used like this:

train_envs = [lambda: Wrapper(MyEnv(xxx)) for _ in range(10)]
test_envs = [lambda: Wrapper(MyEnv(xxx)) for _ in range(4)]
# then put them with SubprocVectorEnv into Collector
# and do RainbowDQN as the examples.

The training result really satisfies me but the testing result is always terrible (worse than the random). I've tried to test the training envs by hand-coded function and it also failed.

What's more, to prove there's no serious problem of my env, I just removed the Wrapper and give some minus reward to the unavailable actions. It appears that it's working for both the training and testing envs but the result is definitily not as good as the Wrapper ones in training.

I wonder if there's something wrong of using Wrapper. Is there any guidance? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions