Question about logits

- [ ] I have marked all applicable categories:
    + [ ] exception-raising bug
    + [x] RL algorithm bug
    + [x] documentation request (i.e. "X is missing from the documentation.")
    + [ ] new feature request
- [x] I have visited the [source website](https://github.com/thu-ml/tianshou/)
- [x] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
- [ ] I have mentioned version numbers, operating system and environment, where applicable:
  ```python
  import tianshou, torch, sys
  print(tianshou.__version__, torch.__version__, sys.version, sys.platform)
  ```
In the DQN implementation, a policy network is required as an input to the DQN policy constructor. This is strange because DQN should only learn the Q-networks, and the policy should be completely determined by the learned Q-values (and the epsilon). I do not understand why a policy network is required as an input, but not a Q-network? Is it a particular variant of DQN? If so, I hope the documentation can be clear on it. And what if I want to use my own architecture for the Q-networks?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about logits #231

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about logits #231

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions