这是indexloc提供的服务,不要输入任何密码
Skip to content

Bug in BCQ discrete algorithm. #963

@ivanappliedai

Description

@ivanappliedai
  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gymnasium as gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

0.5.1 0.29.0 2.0.0+cu117 1.24.3 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] linux

There is a bug in the BCQ discrete algorithm (discrete_bcq.py) in line 100:

act = (q_value - np.inf * mask).argmax(dim=-1)

When mask includes zero values np.inf*mask produce NaN's.

Solution: replace np.inf by the maximum large float.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingminorRequires small changes to be fixed

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions