-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, torch, numpy, sys print(tianshou.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform) # 0.4.3 1.9.0 1.20.3 3.8.10 | packaged by conda-forge | (default, May 11 2021, 06:25:23) [MSC v.1916 64 bit (AMD64)] win32
Thank you for creating such a powerful and useful tool for RLers. It helped me a lot. However, I've found out some problem about reproducing by the test in repo. The following result is the output from test_ppo.py on which I set the stop_fn to None, and run twice on both GPU and CPU. You can see that the result can't be reproduced even if I run them with the same default seed (0). I can't figure out where the problem is. Thanks.
GPU:
1st:
Epoch #1: 50001it [00:59, 836.32it/s, env_step=50000, len=200, loss=-0.816, loss/clip=-0.929, loss/ent=0.504, loss/vf=0.226, n/ep=14, n/st=2000, rew=200.00]
Epoch #1: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #2: 50001it [00:41, 1197.42it/s, env_step=100000, len=200, loss=-0.030, loss/clip=-0.031, loss/ent=0.414, loss/vf=0.001, n/ep=6, n/st=2000, rew=200.00]
Epoch #2: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #3: 50001it [00:38, 1288.49it/s, env_step=150000, len=200, loss=0.017, loss/clip=0.017, loss/ent=0.432, loss/vf=0.001, n/ep=14, n/st=2000, rew=200.00]
Epoch #3: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #4: 50001it [00:39, 1264.23it/s, env_step=200000, len=200, loss=-0.006, loss/clip=-0.007, loss/ent=0.464, loss/vf=0.001, n/ep=6, n/st=2000, rew=200.00]
Epoch #4: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #5: 50001it [00:39, 1275.54it/s, env_step=250000, len=200, loss=-0.000, loss/clip=-0.001, loss/ent=0.463, loss/vf=0.001, n/ep=14, n/st=2000, rew=200.00]
Epoch #5: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #6: 50001it [00:39, 1281.43it/s, env_step=300000, len=200, loss=-0.020, loss/clip=-0.021, loss/ent=0.425, loss/vf=0.001, n/ep=6, n/st=2000, rew=200.00]
Epoch #6: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #7: 50001it [00:38, 1285.40it/s, env_step=350000, len=200, loss=-0.005, loss/clip=-0.006, loss/ent=0.387, loss/vf=0.001, n/ep=14, n/st=2000, rew=200.00]
Epoch #7: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #8: 50001it [00:38, 1296.00it/s, env_step=400000, len=200, loss=-0.014, loss/clip=-0.015, loss/ent=0.334, loss/vf=0.001, n/ep=6, n/st=2000, rew=200.00]
Epoch #8: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #9: 50001it [00:38, 1301.93it/s, env_step=450000, len=200, loss=-0.002, loss/clip=-0.002, loss/ent=0.295, loss/vf=0.000, n/ep=14, n/st=2000, rew=200.00]
Epoch #9: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #10: 50001it [00:40, 1246.57it/s, env_step=500000, len=200, loss=0.007, loss/clip=0.006, loss/ent=0.291, loss/vf=0.001, n/ep=6, n/st=2000, rew=200.00]
Epoch #10: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
{'best_result': '200.00 ± 0.00',
'test_episode': 1100,
'test_speed': '19488.58 step/s',
'test_step': 200934,
'test_time': '10.31s',
'train_episode': 2752,
'train_speed': '1207.23 step/s',
'train_step': 500000,
'train_time/collector': '40.56s',
'train_time/model': '373.61s'}
2nd:
Epoch #1: 50001it [00:40, 1244.54it/s, env_step=50000, len=200, loss=-0.562, loss/clip=-0.584, loss/ent=0.468, loss/vf=0.044, n/ep=7, n/st=2000, rew=200.00]
Epoch #1: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #2: 50001it [00:40, 1237.34it/s, env_step=100000, len=200, loss=-0.023, loss/clip=-0.023, loss/ent=0.335, loss/vf=0.001, n/ep=13, n/st=2000, rew=200.00]
Epoch #2: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #3: 50001it [00:40, 1248.02it/s, env_step=150000, len=200, loss=0.001, loss/clip=0.001, loss/ent=0.290, loss/vf=0.001, n/ep=7, n/st=2000, rew=200.00]
Epoch #3: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #4: 50001it [00:42, 1181.84it/s, env_step=200000, len=200, loss=-0.018, loss/clip=-0.018, loss/ent=0.265, loss/vf=0.001, n/ep=13, n/st=2000, rew=200.00]
Epoch #4: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #5: 50001it [00:40, 1228.67it/s, env_step=250000, len=200, loss=-0.010, loss/clip=-0.011, loss/ent=0.257, loss/vf=0.001, n/ep=7, n/st=2000, rew=200.00]
Epoch #5: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #6: 50001it [00:39, 1252.70it/s, env_step=300000, len=200, loss=-0.015, loss/clip=-0.015, loss/ent=0.197, loss/vf=0.001, n/ep=13, n/st=2000, rew=200.00]
Epoch #6: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #7: 50001it [00:39, 1269.67it/s, env_step=350000, len=200, loss=0.001, loss/clip=0.001, loss/ent=0.203, loss/vf=0.000, n/ep=7, n/st=2000, rew=200.00]
Epoch #7: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #8: 50001it [00:39, 1251.08it/s, env_step=400000, len=200, loss=-0.010, loss/clip=-0.011, loss/ent=0.183, loss/vf=0.001, n/ep=13, n/st=2000, rew=200.00]
Epoch #8: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #9: 50001it [00:40, 1233.73it/s, env_step=450000, len=200, loss=0.004, loss/clip=0.003, loss/ent=0.167, loss/vf=0.002, n/ep=7, n/st=2000, rew=200.00]
Epoch #9: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #10: 50001it [00:39, 1253.52it/s, env_step=500000, len=200, loss=-0.375, loss/clip=-0.459, loss/ent=0.223, loss/vf=0.167, n/ep=10, n/st=2000, rew=200.00]
Epoch #10: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
{'best_result': '200.00 ± 0.00',
'best_reward': 200.0,
'duration': '412.82s',
'test_episode': 1100,
'test_speed': '21288.98 step/s',
'test_step': 200933,
'test_time': '9.44s',
'train_episode': 2741,
'train_speed': '1239.52 step/s',
'train_step': 500000,
'train_time/collector': '39.49s',
'train_time/model': '363.89s'}
CPU:
1st:
Epoch #1: 50001it [00:24, 2063.07it/s, env_step=50000, len=200, loss=-1.161, loss/clip=-1.285, loss/ent=0.487, loss/vf=0.248, n/ep=8, n/st=2000, rew=200.00]
Epoch #1: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #2: 50001it [00:24, 2036.06it/s, env_step=100000, len=200, loss=-0.027, loss/clip=-0.027, loss/ent=0.345, loss/vf=0.001, n/ep=12, n/st=2000, rew=200.00]
Epoch #2: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #3: 50001it [00:24, 2063.22it/s, env_step=150000, len=200, loss=-0.007, loss/clip=-0.007, loss/ent=0.321, loss/vf=0.000, n/ep=8, n/st=2000, rew=200.00]
Epoch #3: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #4: 50001it [00:24, 2045.94it/s, env_step=200000, len=200, loss=-0.009, loss/clip=-0.010, loss/ent=0.283, loss/vf=0.000, n/ep=12, n/st=2000, rew=200.00]
Epoch #4: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #5: 50001it [00:24, 2070.64it/s, env_step=250000, len=200, loss=0.017, loss/clip=0.016, loss/ent=0.302, loss/vf=0.001, n/ep=8, n/st=2000, rew=200.00]
Epoch #5: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #6: 50001it [00:23, 2092.19it/s, env_step=300000, len=200, loss=-0.003, loss/clip=-0.003, loss/ent=0.326, loss/vf=0.000, n/ep=12, n/st=2000, rew=200.00]
Epoch #6: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #7: 50001it [00:23, 2093.97it/s, env_step=350000, len=200, loss=-0.009, loss/clip=-0.009, loss/ent=0.318, loss/vf=0.001, n/ep=8, n/st=2000, rew=200.00]
Epoch #7: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #8: 50001it [00:24, 2072.37it/s, env_step=400000, len=200, loss=-0.004, loss/clip=-0.005, loss/ent=0.286, loss/vf=0.001, n/ep=12, n/st=2000, rew=200.00]
Epoch #8: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #9: 50001it [00:24, 2025.97it/s, env_step=450000, len=200, loss=-0.003, loss/clip=-0.004, loss/ent=0.259, loss/vf=0.001, n/ep=8, n/st=2000, rew=200.00]
Epoch #9: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #10: 50001it [00:24, 2006.94it/s, env_step=500000, len=200, loss=0.007, loss/clip=0.007, loss/ent=0.261, loss/vf=0.002, n/ep=12, n/st=2000, rew=200.00]
Epoch #10: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
{'best_result': '200.00 ± 0.00',
'best_reward': 200.0,
'duration': '253.11s',
'test_episode': 1100,
'test_speed': '20295.32 step/s',
'test_step': 201724,
'test_time': '9.94s',
'train_episode': 2731,
'train_speed': '2056.16 step/s',
'train_step': 500000,
'train_time/collector': '35.40s',
'train_time/model': '207.78s'}
2nd:
Epoch #1: 50001it [00:25, 1986.18it/s, env_step=50000, len=200, loss=-0.693, loss/clip=-0.732, loss/ent=0.478, loss/vf=0.079, n/ep=9, n/st=2000, rew=200.00]
Epoch #1: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #2: 50001it [00:23, 2130.81it/s, env_step=100000, len=200, loss=-0.024, loss/clip=-0.025, loss/ent=0.362, loss/vf=0.002, n/ep=11, n/st=2000, rew=200.00]
Epoch #2: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #3: 50001it [00:23, 2098.16it/s, env_step=150000, len=200, loss=0.007, loss/clip=0.006, loss/ent=0.351, loss/vf=0.001, n/ep=9, n/st=2000, rew=200.00]
Epoch #3: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #4: 50001it [00:23, 2134.66it/s, env_step=200000, len=200, loss=0.003, loss/clip=0.003, loss/ent=0.359, loss/vf=0.001, n/ep=11, n/st=2000, rew=200.00]
Epoch #4: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #5: 50001it [00:23, 2128.62it/s, env_step=250000, len=200, loss=0.001, loss/clip=0.001, loss/ent=0.356, loss/vf=0.001, n/ep=9, n/st=2000, rew=200.00]
Epoch #5: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #6: 50001it [00:23, 2114.38it/s, env_step=300000, len=200, loss=-0.010, loss/clip=-0.010, loss/ent=0.294, loss/vf=0.001, n/ep=11, n/st=2000, rew=200.00]
Epoch #6: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #7: 50001it [00:23, 2129.98it/s, env_step=350000, len=200, loss=-0.010, loss/clip=-0.010, loss/ent=0.254, loss/vf=0.000, n/ep=9, n/st=2000, rew=200.00]
Epoch #7: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #8: 50001it [00:23, 2129.18it/s, env_step=400000, len=200, loss=0.012, loss/clip=0.012, loss/ent=0.256, loss/vf=0.001, n/ep=11, n/st=2000, rew=200.00]
Epoch #8: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #9: 50001it [00:23, 2130.25it/s, env_step=450000, len=200, loss=-0.002, loss/clip=-0.003, loss/ent=0.271, loss/vf=0.001, n/ep=9, n/st=2000, rew=200.00]
Epoch #9: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
Epoch #10: 50001it [00:23, 2111.43it/s, env_step=500000, len=200, loss=-0.003, loss/clip=-0.004, loss/ent=0.291, loss/vf=0.001, n/ep=11, n/st=2000, rew=200.00]
Epoch #10: test_reward: 200.000000 ± 0.000000, best_reward: 200.000000 ± 0.000000 in #1
{'best_result': '200.00 ± 0.00',
'best_reward': 200.0,
'duration': '246.91s',
'test_episode': 1100,
'test_speed': '20778.25 step/s',
'test_step': 201724,
'test_time': '9.71s',
'train_episode': 2738,
'train_speed': '2107.91 step/s',
'train_step': 500000,
'train_time/collector': '34.49s',
'train_time/model': '202.71s'}
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working