Major Performance Decrease in Tianshou 1.2 Compared to 0.5 on Windows and Linux

Hello,

I used Tainshou 0.5 on a custom environment running on a Windows PC. I was impressed by the training speed of the PPO agent, which exceeded 2000 iterations per second.

```python
import tianshou, gymnasium as gym, torch, numpy, sys
  print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
```
0.5.0 0.26.3 2.5.1 1.26.4 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:17:14) [MSC v.1941 64 bit (AMD64)] win32

Training using the Tianshou library, version 0.5:
```txt
Epoch #1: 10112it [00:03, 2748.12it/s, env_step=10112, len=0, loss=0.303, loss/clip=0.000, loss/enEpoch #1: 10112it [00:03, 2610.28it/s, env_step=10112, len=0, loss=0.303, loss/clip=0.000, loss/ent=0.916, loss/vf=0.623, n/ep=0, n/st=128, rew=0.00]
Epoch #1: test_reward: 27.403868 ± 0.000000, best_reward: 27.403868 ± 0.000000 in #1
Epoch #2: 10112it [00:03, 2644.64it/s, env_step=20224, len=0, loss=0.500, loss/clip=0.000, loss/enEpoch #2: 10112it [00:03, 2634.27it/st/s, env_step=20224, len=0, loss=0.500, loss/clip=0.000, loss/ent=0.915, loss/vf=1.018, n/ep=0, n/st=128, rew=0.00]
Epoch #2: test_reward: 33.482427 ± 0.000000, best_reward: 33.482427 ± 0.000000 in #2
Epoch #3: 10112it [00:04, 2376.15it/s, env_step=30208, len=0, loss=0.715, loss/clip=-0.000, loss/eEpoch #3: 10112it [00:04, 2376.15it/st/s, env_step=30336, len=0, loss=0.713, loss/clip=-0.000, loss/eEpoch #3: 10112it [00:04, 2442.15it/s, env_step=30336, len=0, loss=03, .713, loss/clip=-0.000, loss/ent=0.913, loss/vf=1.445, n/ep=0, n/st=128, rew=0.00]
Epoch #3: test_reward: 35.236934 ± 0.000000, best_reward: 35.236934 ± 0.000000 in #3
Epoch #4:  59%|5| 5888/10000 [00:02<00:01, 2481.35it/s, env_step=36224, len=0, loss=0.614, loss/clip=0.000, loss/ent=0.911, loss/vf==1Epoch #4:  60%|6| 6016/10000 [00:02<00:01, 2559.46it/s, env_step=36224, len=0, loss=0.614, loss/clip=0.000, loss/ent=0.911, loss/vf 
Epoch #4: 10112it [00:04, 2473.71it/s, env_step=40448, len=0, loss=0.547, loss/clip=0.000, loss/ent=0.910, loss/vf=1.112, n/ep=0, Epoch #4: 10112it [00:04, 2508.40it/s, env_step=40448, len=0, loss=0.547, loss/clip=0.000, loss/ent=0.910, loss/vf=1.112, ep=0, st=128, rew=0.00]
Epoch #4: test_reward: 22.770667 ± 0.000000, best_reward: 35.236934 ± 0.000000 in #3
Epoch #5: 10112it [00:04, 2383.42it/s, env_step=50432, len=334, loss=0.479, loss/clip=0.000, loss/ent=0.911, loss/vf=0.976, ep=0, 
Epoch #5: 10112it [00:04, 2383.42it/s, env_step=50560, len=334, loss=0.476, loss/clip=0.000, loss/ent=0.911, loss/vf=0.970, ep=0, 
Epoch #5: 10112it [00:04, 2479.57it/s, env_step=50560, len=334, loss=0.476, loss/clip=0.000, loss/ent=0.911, loss/vf=0.970, ep=0, st=128, rew=54.33]
Epoch #5: test_reward: 29.205846 ± 0.000000, best_reward: 35.236934 ± 0.000000 in #3
```

Recently, I upgraded to Tianshou 1.2, keeping the agent configuration the same. However, I observed a significant performance drop, with the new version running approximately 12 times slower, as shown below. I also tested that on Linux and observed the same results:

```python
import tianshou, gymnasium as gym, torch, numpy, sys
  print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
```
1.2.0-dev 0.28.1 2.1.1+cu121 1.24.4 3.11.10 (main, Sep  7 2024, 18:35:41) [GCC 11.4.0] linu

Training using the Tianshou library, version 1.2:
```txt
Epoch #1: 10112it [00:59, 169.41it/s, env_episode=0, env_step=10112, gradient_step=158, len=0, n/ep=0, n/st=128, rew=0.00]
Epoch #2: 10112it [00:59, 171.18it/s, env_episode=0, env_step=20224, gradient_step=316, len=0, n/ep=0, n/st=128, rew=0.00]             
Epoch #3: 10112it [00:59, 170.92it/s, env_episode=0, env_step=30336, gradient_step=474, len=0, n/ep=0, n/st=128, rew=0.00]             
Epoch #4: 10112it [00:59, 171.19it/s, env_episode=0, env_step=40448, gradient_step=632, len=0, n/ep=0, n/st=128, rew=0.00]             
Epoch #5: 10112it [00:59, 170.45it/s, env_episode=128, env_step=50560, gradient_step=790, len=1, n/ep=0, n/st=128, rew=41.83]          
```

Have there been changes to the library that impact execution performance, and can I restore previous performance levels through configuration adjustments?



- [x] I have marked all applicable categories:
    + [ ] exception-raising bug
    + [x] RL algorithm bug
    + [ ] documentation request (i.e. "X is missing from the documentation.")
    + [ ] new feature request
    + [ ] design request (i.e. "X should be changed to Y.")
- [x] I have visited the [source website](https://github.com/thu-ml/tianshou/)
- [x] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
- [x] I have mentioned version numbers, operating system and environment.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Major Performance Decrease in Tianshou 1.2 Compared to 0.5 on Windows and Linux #1225

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Major Performance Decrease in Tianshou 1.2 Compared to 0.5 on Windows and Linux #1225

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions