atari_dqn.py Training Speed Drops Significantly Over Time

Hi team,

I've encountered a performance issue while running the _example/atari/atari_dqn.py_ script. The training speed is **slow, and slows down drastically** as it progresses.

**Environment:**
**GPU:** NVIDIA A6000
**PyTorch Version:** 2.1.1+cu113, 2.6.0+cu126 (tried both)
**Script:** example/atari/atari_dqn.py
**Env**: Alien & Pong
**Seed**: 1

**Steps to Reproduce:**
Run the example/atari/atari_dqn.py script.
Monitor the training speed (iterations per second, it/s).

**Observed Behavior:**
Initially, the training speed is around **40-50 it/s**.
The it/s continuously decreases. Around 2,400 training steps, the speed drops to **below 10 it/s**, and later even **below 7 it/s**.
I attempted to mitigate this by **setting training_num=100**.
While increasing training_num might slightly delay the severe drop, the it/s still falls below 10 around 20,000 training steps.
At this reduced speed, **a single epoch is estimated to take over 2 hours to complete**.


Could you please look into this? Let me know if you need any more information from my end.

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

atari_dqn.py Training Speed Drops Significantly Over Time #1265

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

atari_dqn.py Training Speed Drops Significantly Over Time #1265

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions