-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, torch, sys print(tianshou.__version__, torch.__version__, sys.version, sys.platform)
In the DQN implementation, a policy network is required as an input to the DQN policy constructor. This is strange because DQN should only learn the Q-networks, and the policy should be completely determined by the learned Q-values (and the epsilon). I do not understand why a policy network is required as an input, but not a Q-network? Is it a particular variant of DQN? If so, I hope the documentation can be clear on it. And what if I want to use my own architecture for the Q-networks?
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested