-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
use tianshou 1.1.0
windows
(tianshou) F:\tianshou-1.1.0\examples\discrete>python discrete_dqn.py
Epoch #1: 10001it [00:05, 1744.76it/s, env_step=10000, gradient_step=1000, len=191, n/ep=0, n/st=10, rew=191.00]
Epoch #1: test_reward: 153.210000 ± 33.466489, best_reward: 153.210000 ± 33.466489 in #1
Epoch #2: 10001it [00:04, 2008.32it/s, env_step=20000, gradient_step=2000, len=184, n/ep=0, n/st=10, rew=184.00]
Epoch #2: test_reward: 197.180000 ± 50.608177, best_reward: 197.180000 ± 50.608177 in #2
Epoch #3: 10001it [00:05, 1981.24it/s, env_step=30000, gradient_step=3000, len=293, n/ep=0, n/st=10, rew=293.00]
Epoch #3: test_reward: 238.720000 ± 55.338789, best_reward: 238.720000 ± 55.338789 in #3
Epoch #4: 10001it [00:05, 1964.89it/s, env_step=40000, gradient_step=4000, len=231, n/ep=0, n/st=10, rew=231.00]
Epoch #4: test_reward: 187.850000 ± 36.271580, best_reward: 238.720000 ± 55.338789 in #3
Epoch #5: 10001it [00:05, 1975.96it/s, env_step=50000, gradient_step=5000, len=194, n/ep=0, n/st=10, rew=194.00]
Epoch #5: test_reward: 223.770000 ± 44.344076, best_reward: 238.720000 ± 55.338789 in #3
Epoch #6: 10001it [00:05, 1990.86it/s, env_step=60000, gradient_step=6000, len=334, n/ep=0, n/st=10, rew=334.00]
Epoch #6: test_reward: 212.280000 ± 44.395063, best_reward: 238.720000 ± 55.338789 in #3
Epoch #7: 10001it [00:05, 1980.04it/s, env_step=70000, gradient_step=7000, len=309, n/ep=1, n/st=10, rew=309.00]
Epoch #7: test_reward: 198.770000 ± 28.946798, best_reward: 238.720000 ± 55.338789 in #3
Epoch #8: 10001it [00:05, 1994.70it/s, env_step=80000, gradient_step=8000, len=296, n/ep=0, n/st=10, rew=296.00]
Epoch #8: test_reward: 195.870000 ± 46.731928, best_reward: 238.720000 ± 55.338789 in #3
Epoch #9: 10001it [00:05, 1990.03it/s, env_step=90000, gradient_step=9000, len=310, n/ep=0, n/st=10, rew=310.00]
Epoch #9: test_reward: 80.420000 ± 43.444719, best_reward: 238.720000 ± 55.338789 in #3
Epoch #10: 10001it [00:05, 1977.86it/s, env_step=100000, gradient_step=10000, len=217, n/ep=0, n/st=10, rew=217.00]
Epoch #10: test_reward: 217.060000 ± 34.864544, best_reward: 238.720000 ± 55.338789 in #3
Finished training in 63.13353943824768 seconds
C:\Users\BW\AppData\Roaming\Python\Python312\site-packages\tianshou\data\collector.py:152: UserWarning: Single environment detected, wrap to DummyVectorEnv.
warnings.warn("Single environment detected, wrap to DummyVectorEnv.")
Traceback (most recent call last):
File "F:\tianshou-1.1.0\examples\discrete\discrete_dqn.py", line 89, in
main()
File "F:\tianshou-1.1.0\examples\discrete\discrete_dqn.py", line 85, in main
collector.collect(n_episode=100, render=1 / 35)
File "C:\Users\BW\AppData\Roaming\Python\Python312\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\BW\AppData\Roaming\Python\Python312\site-packages\tianshou\data\collector.py", line 304, in collect
return self._collect(
^^^^^^^^^^^^^^
File "C:\Users\BW\AppData\Roaming\Python\Python312\site-packages\tianshou\data\collector.py", line 470, in _collect
raise ValueError(
ValueError: Initial obs and info should not be None. Either reset the collector (using reset or reset_env) or pass reset_before_collect=True to collect.