-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
- I have marked all applicable categories:
- [√ ] exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- [ √] I have visited the source website
- [ √] I have searched through the issue tracker for duplicates
- [ √] I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, torch, sys print(tianshou.__version__, torch.__version__, sys.version, sys.platform) # 0.2.4 1.4.0 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] linux
Hi, first thanks for the excellent work!
One thing I found may has potential bug.
When I was using a2c policy, I found that you calculate a_loss in this way, which may not work if the action has multiple dimension, say if the shape of output of dist.log_prob(a) is [bsz, n] while the shape of (r-v) is [bsz, 1] or [bsz] and neither of them can do dot product with the dis.log_prob(a). In my opinion, it would be better to use dist.log_prob(a).transpose(0,1) so that it will do multiplication among the same dimension.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working