-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, torch, numpy, sys print(tianshou.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
Related to #449. I was trying to fix the reproducibility issue in atari_bcq.py
and I initially believed it was coming from set(a.parameters()).union(b.parameters())
. However, even if I kept the order of parameters fixed, I still got different results between two runs with the same seed. For example, I ran python3 ./atari_dqn.py --task PongNoFrameskip-v4 --epoch 5
for 3 times and got 3 different results:
❯ grep best_reward log.dqn.pong.epoch_5*
log.dqn.pong.epoch_5:Epoch #1: test_reward: -21.000000 ± 0.000000, best_reward: -21.000000 ± 0.000000 in #0
log.dqn.pong.epoch_5:Epoch #2: test_reward: -21.000000 ± 0.000000, best_reward: -21.000000 ± 0.000000 in #0
log.dqn.pong.epoch_5:Epoch #3: test_reward: -21.000000 ± 0.000000, best_reward: -21.000000 ± 0.000000 in #0
log.dqn.pong.epoch_5:Epoch #4: test_reward: -21.000000 ± 0.000000, best_reward: -21.000000 ± 0.000000 in #0
log.dqn.pong.epoch_5:Epoch #5: test_reward: -21.000000 ± 0.000000, best_reward: -21.000000 ± 0.000000 in #0
log.dqn.pong.epoch_5: 'best_reward': -21.0,
log.dqn.pong.epoch_5.1:Epoch #1: test_reward: -19.000000 ± 1.264911, best_reward: -19.000000 ± 1.264911 in #1
log.dqn.pong.epoch_5.1:Epoch #2: test_reward: -16.100000 ± 2.385372, best_reward: -16.100000 ± 2.385372 in #2
log.dqn.pong.epoch_5.1:Epoch #3: test_reward: -19.100000 ± 0.943398, best_reward: -16.100000 ± 2.385372 in #2
log.dqn.pong.epoch_5.1:Epoch #4: test_reward: -18.600000 ± 1.624808, best_reward: -16.100000 ± 2.385372 in #2
log.dqn.pong.epoch_5.1:Epoch #5: test_reward: 2.400000 ± 2.059126, best_reward: 2.400000 ± 2.059126 in #5
log.dqn.pong.epoch_5.1: 'best_reward': 2.4,
log.dqn.pong.epoch_5.2:Epoch #1: test_reward: -20.800000 ± 0.600000, best_reward: -20.800000 ± 0.600000 in #1
log.dqn.pong.epoch_5.2:Epoch #2: test_reward: -12.800000 ± 0.600000, best_reward: -12.800000 ± 0.600000 in #2
log.dqn.pong.epoch_5.2:Epoch #3: test_reward: -16.000000 ± 1.843909, best_reward: -12.800000 ± 0.600000 in #2
log.dqn.pong.epoch_5.2:Epoch #4: test_reward: -12.700000 ± 3.689173, best_reward: -12.700000 ± 3.689173 in #4
log.dqn.pong.epoch_5.2:Epoch #5: test_reward: -10.400000 ± 1.562050, best_reward: -10.400000 ± 1.562050 in #5
log.dqn.pong.epoch_5.2: 'best_reward': -10.4,
I wonder where the randomness comes from given we already set the seeds like:
np.random.seed(args.seed)
torch.manual_seed(args.seed)
train_envs.seed(args.seed)
test_envs.seed(args.seed)
Does it come from the vector replay buffer? Or does it come from GPU? Is it possible to remove it?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working