From e80cac1bc4ae108d29226d42d253dfd1d281a31e Mon Sep 17 00:00:00 2001
From: chy <308604256@qq.com>
Date: Wed, 27 Apr 2022 19:35:38 +0800
Subject: [PATCH 1/4] Atari table

---
 docs/tutorials/benchmark.rst | 62 +++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)
diff --git a/docs/tutorials/benchmark.rst b/docs/tutorials/benchmark.rst
index f1cb6ba5a..c0f5fb92e 100644
--- a/docs/tutorials/benchmark.rst
+++ b/docs/tutorials/benchmark.rst
@@ -94,7 +94,9 @@ TRPO      16        7min         62.9           26.5         10.1           0.6
 Atari Benchmark
 ---------------
 
-Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari
+Tianshou also provides reliable and reproducible Atari 10M benchmark.
+
+Every experiment is conducted under 10 random seeds for 10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari for source code and refer to https://wandb.ai/tianshou/atari.benchmark/reports/Atari-Benchmark--VmlldzoxOTA1NzA5 for detailed results hosted on wandb.
 
 .. raw:: html
 
@@ -105,3 +107,61 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari
         <br>
     </center>
 
+
+The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `Google Dopamine <https://github.com/google/dopamine/tree/master/baselines/atari>`_ and `OpenAI Baselines <https://github.com/openai/baselines>`_.
+
+# TODO table here @jiayi
+
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|Task                      |Ant       |HalfCheetah|Hopper    |Walker2d  |Swimmer  |Humanoid  |Reacher |IPendulum |IDPendulum|
++=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
+|DDPG     |Tianshou        |990.4     |**11718.7**|**2197.0**|1400.6    |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper       |**1005.3**|3305.6     |**2020.5**|1843.6    |/        |/         |-6.5    |**1000.0**|**9355.5**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper (Our) |888.8     |8577.3     |1860.0    |**3098.1**|/        |/         |-4.0    |**1000.0**|8370.0    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up     |~840      |~11000     |~1800     |~1950     |~137     |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|TD3      |Tianshou        |**5116.4**|**10201.2**|3472.2    |3982.4    |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper       |4372.4    |9637.0     |**3564.1**|**4682.8**|/        |/         |-3.6    |**1000.0**|9337.5    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up     |~3800     |~9750      |~2860     |~4000     |~78      |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|SAC      |Tianshou        |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |SAC Paper       |~3720     |~10400     |~3370     |~3740     |/        |~5200     |/       |/         |/         |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper       |655.4     |2347.2     |2996.7    |1283.7    |/        |/         |-4.4    |**1000.0**|8487.2    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up     |~3980     |~11520     |~3150     |~4250     |~41.7    |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|A2C      |Tianshou        |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |PPO Paper       |/         |~1000      |~900      |~850      |~31      |/         |~-24    |**~1000** |~7100     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |PPO Paper (TR)  |/         |~930       |~1220     |~700      |**~36**  |/         |~-27    |**~1000** |~8100     |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|PPO      |Tianshou        |**3258.4**|**5783.9** |**2609.3**|3588.5    |66.7     |**787.1** |**-4.1**|**1000.0**|**9231.3**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |PPO Paper       |/         |~1800      |~2330     |~3460     |~108     |/         |~-7     |**~1000** |~8000     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper       |1083.2    |1795.4     |2164.7    |3317.7    |/        |/         |-6.2    |**1000.0**|8977.9    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |OpenAI Baselines|/         |~1700      |~2400     |~3510     |~111     |/         |~-6     |~940      |~7350     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up     |~650      |~1670      |~1850     |~1230     |**~120** |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|TRPO     |Tianshou        |**2866.7**|**4471.2** |2046.0    |**3826.7**|40.9     |**810.1** |**-5.1**|**1000.0**|**8435.2**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |ACKTR paper     |~0        |~400       |~1400     |~550      |~40      |/         |-8      |**~1000** |~800      |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |PPO Paper       |/         |~0         |~2100     |~1100     |**~121** |/         |~-115   |**~1000** |~200      |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 paper       |-75.9     |-15.6      |**2471.3**|2321.5    |/        |/         |-111.4  |985.4     |205.9     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |OpenAI Baselines|/         |~1350      |**~2200** |~2350     |~95      |/         |**~-5** |~910      |~7000     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up (TF)|~150      |~850       |~1200     |~600      |~85      |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
\ No newline at end of file

From bdd73c9edccc317c000271c2f93405b11bcd1ea6 Mon Sep 17 00:00:00 2001
From: chy <308604256@qq.com>
Date: Wed, 27 Apr 2022 19:59:44 +0800
Subject: [PATCH 2/4] add

---
 docs/tutorials/benchmark.rst | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/tutorials/benchmark.rst b/docs/tutorials/benchmark.rst
index c0f5fb92e..ea39d42eb 100644
--- a/docs/tutorials/benchmark.rst
+++ b/docs/tutorials/benchmark.rst
@@ -164,4 +164,6 @@ The table below compares the performance of Tianshou against published results o
 |         |OpenAI Baselines|/         |~1350      |**~2200** |~2350     |~95      |/         |**~-5** |~910      |~7000     |
 +         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
 |         |Spinning Up (TF)|~150      |~850       |~1200     |~600      |~85      |/         |/       |/         |/         |
-+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
\ No newline at end of file
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+
+Please note that the comparison table for both two benchmarks could NOT be used to prove which implementation is "better". The hyperparameters of the algorithms vary across different implementations. Also, the reward metric is not strictly the same (e.g. Tianshou uses max average return in 10M steps but OpenAI Baselines only report average return at 10M steps, which is unfair). Lastly, Tianshou always uses 10 random seeds while others might use fewer. The comparison is here only to show Tianshou's reliability.
\ No newline at end of file

From 943916444e8a3368e3f4ab8d2734f53e3ce271ba Mon Sep 17 00:00:00 2001
From: chy <308604256@qq.com>
Date: Wed, 27 Apr 2022 20:15:11 +0800
Subject: [PATCH 3/4] add

---
 docs/tutorials/benchmark.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tutorials/benchmark.rst b/docs/tutorials/benchmark.rst
index ea39d42eb..609b0b333 100644
--- a/docs/tutorials/benchmark.rst
+++ b/docs/tutorials/benchmark.rst
@@ -108,7 +108,7 @@ Every experiment is conducted under 10 random seeds for 10M steps. Please refer
     </center>
 
 
-The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `Google Dopamine <https://github.com/google/dopamine/tree/master/baselines/atari>`_ and `OpenAI Baselines <https://github.com/openai/baselines>`_.
+The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric (to be consistent with Mujoco). - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `Google Dopamine <https://github.com/google/dopamine/tree/master/baselines/atari>`_ and `OpenAI Baselines <https://github.com/openai/baselines>`_.
 
 # TODO table here @jiayi
 

From a4398d3dd3325151e0b43e8d8f1f52576e5d56b5 Mon Sep 17 00:00:00 2001
From: Jiayi Weng <trinkle23897@gmail.com>
Date: Wed, 27 Apr 2022 08:42:27 -0400
Subject: [PATCH 4/4] update table

---
 docs/spelling_wordlist.txt   |  3 ++
 docs/tutorials/benchmark.rst | 92 +++++++++++++-----------------------
 2 files changed, 36 insertions(+), 59 deletions(-)

diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index cf78b00b8..31e53fce0 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -154,3 +154,6 @@ IPendulum
 Reacher
 Runtime
 Nvidia
+Enduro
+Qbert
+Seaquest
diff --git a/docs/tutorials/benchmark.rst b/docs/tutorials/benchmark.rst
index 609b0b333..b67ccbfd7 100644
--- a/docs/tutorials/benchmark.rst
+++ b/docs/tutorials/benchmark.rst
@@ -108,62 +108,36 @@ Every experiment is conducted under 10 random seeds for 10M steps. Please refer
     </center>
 
 
-The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric (to be consistent with Mujoco). - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `Google Dopamine <https://github.com/google/dopamine/tree/master/baselines/atari>`_ and `OpenAI Baselines <https://github.com/openai/baselines>`_.
-
-# TODO table here @jiayi
-
-+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|Task                      |Ant       |HalfCheetah|Hopper    |Walker2d  |Swimmer  |Humanoid  |Reacher |IPendulum |IDPendulum|
-+=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
-|DDPG     |Tianshou        |990.4     |**11718.7**|**2197.0**|1400.6    |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3    |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |TD3 Paper       |**1005.3**|3305.6     |**2020.5**|1843.6    |/        |/         |-6.5    |**1000.0**|**9355.5**|
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |TD3 Paper (Our) |888.8     |8577.3     |1860.0    |**3098.1**|/        |/         |-4.0    |**1000.0**|8370.0    |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |Spinning Up     |~840      |~11000     |~1800     |~1950     |~137     |/         |/       |/         |/         |
-+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|TD3      |Tianshou        |**5116.4**|**10201.2**|3472.2    |3982.4    |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |TD3 Paper       |4372.4    |9637.0     |**3564.1**|**4682.8**|/        |/         |-3.6    |**1000.0**|9337.5    |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |Spinning Up     |~3800     |~9750      |~2860     |~4000     |~78      |/         |/       |/         |/         |
-+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|SAC      |Tianshou        |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |SAC Paper       |~3720     |~10400     |~3370     |~3740     |/        |~5200     |/       |/         |/         |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |TD3 Paper       |655.4     |2347.2     |2996.7    |1283.7    |/        |/         |-4.4    |**1000.0**|8487.2    |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |Spinning Up     |~3980     |~11520     |~3150     |~4250     |~41.7    |/         |/       |/         |/         |
-+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|A2C      |Tianshou        |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |PPO Paper       |/         |~1000      |~900      |~850      |~31      |/         |~-24    |**~1000** |~7100     |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |PPO Paper (TR)  |/         |~930       |~1220     |~700      |**~36**  |/         |~-27    |**~1000** |~8100     |
-+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|PPO      |Tianshou        |**3258.4**|**5783.9** |**2609.3**|3588.5    |66.7     |**787.1** |**-4.1**|**1000.0**|**9231.3**|
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |PPO Paper       |/         |~1800      |~2330     |~3460     |~108     |/         |~-7     |**~1000** |~8000     |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |TD3 Paper       |1083.2    |1795.4     |2164.7    |3317.7    |/        |/         |-6.2    |**1000.0**|8977.9    |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |OpenAI Baselines|/         |~1700      |~2400     |~3510     |~111     |/         |~-6     |~940      |~7350     |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |Spinning Up     |~650      |~1670      |~1850     |~1230     |**~120** |/         |/       |/         |/         |
-+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|TRPO     |Tianshou        |**2866.7**|**4471.2** |2046.0    |**3826.7**|40.9     |**810.1** |**-5.1**|**1000.0**|**8435.2**|
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |ACKTR paper     |~0        |~400       |~1400     |~550      |~40      |/         |-8      |**~1000** |~800      |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |PPO Paper       |/         |~0         |~2100     |~1100     |**~121** |/         |~-115   |**~1000** |~200      |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |TD3 paper       |-75.9     |-15.6      |**2471.3**|2321.5    |/        |/         |-111.4  |985.4     |205.9     |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |OpenAI Baselines|/         |~1350      |**~2200** |~2350     |~95      |/         |**~-5** |~910      |~7000     |
-+         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-|         |Spinning Up (TF)|~150      |~850       |~1200     |~600      |~85      |/         |/       |/         |/         |
-+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
-
-Please note that the comparison table for both two benchmarks could NOT be used to prove which implementation is "better". The hyperparameters of the algorithms vary across different implementations. Also, the reward metric is not strictly the same (e.g. Tianshou uses max average return in 10M steps but OpenAI Baselines only report average return at 10M steps, which is unfair). Lastly, Tianshou always uses 10 random seeds while others might use fewer. The comparison is here only to show Tianshou's reliability.
\ No newline at end of file
+The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric **(to be consistent with Mujoco)**. ``/`` means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `Google Dopamine <https://github.com/google/dopamine/tree/master/baselines/atari>`_ and `OpenAI Baselines <https://github.com/openai/baselines>`_.
+
++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|Task                    |Pong          |Breakout        |Enduro            |Qbert               |MsPacman      |Seaquest           |SpaceInvaders     |
++=======+================+==============+================+==================+====================+==============+===================+==================+
+|DQN    |Tianshou        |**20.2 ± 2.3**|**133.5 ± 44.6**|997.9 ± 180.6     |**11620.2 ± 786.1** |2324.8 ± 359.8|**3213.9 ± 381.6** |947.9 ± 155.3     |
++       +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|       |Dopamine        |9.8           |92.2            |**2126.9**        |6836.7              |**2451.3**    |1406.6             |**1559.1**        |
++       +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|       |OpenAI Baselines|16.5          |131.5           |479.8             |3254.8              |/             |1164.1             |1129.5 ± 145.3    |
++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|C51    |Tianshou        |**20.6 ± 2.4**|**412.9 ± 35.8**|**940.8 ± 133.9** |**12513.2 ± 1274.6**|2254.9 ± 201.2|**3305.4 ± 1524.3**|557.3             |
++       +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|       |Dopamine        |17.4          |222.4           |665.3             |9924.5              |**2860.4**    |1706.6             |**604.6 ± 157.5** |
++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|Rainbow|Tianshou        |**20.2 ± 3.0**|**440.4 ± 50.1**|1496.1 ± 112.3    |14224.8 ± 1230.1    |2524.2 ± 338.8|1934.6 ± 376.4     |**1178.4**        |
++       +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|       |Dopamine        |19.1          |47.9            |**2185.1**        |**15682.2**         |**3161.7**    |**3328.9**         |459.9             |
++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|IQN    |Tianshou        |**20.7 ± 2.9**|**355.9 ± 22.7**|**1252.7 ± 118.1**|**14409.2 ± 808.6** |2228.6 ± 253.1|5341.2 ± 670.2     |667.8 ± 81.5      |
++       +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|       |Dopamine        |19.6          |96.3            |1227.6            |12496.7             |**4422.7**    |**16418**          |**1358.2 ± 267.6**|
++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|PPO    |Tianshou        |**20.3 ± 1.2**|**283.0 ± 74.3**|**1098.9 ± 110.5**|**12341.8 ± 1760.7**|1699.4 ± 248.0|1035.2 ± 353.6     |1641.3            |
++       +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|       |OpenAI Baselines|13.7          |114.3           |350.2             |7012.1              |/             |**1218.9**         |**1787.5 ± 340.8**|
++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|QR-DQN |Tianshou        |20.7 ± 2.0    |228.3 ± 27.3    |951.7 ± 333.5     |14761.5 ± 862.9     |2259.3 ± 269.2|4187.6 ± 725.7     |1114.7 ± 116.9    |
++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+|FQF    |Tianshou        |20.4 ± 2.5    |382.6 ± 29.5    |1816.8 ± 314.3    |15301.2 ± 684.1     |2506.6 ± 402.5|8051.5 ± 3155.6    |2558.3            |
++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
+
+Please note that the comparison table for both two benchmarks could NOT be used to prove which implementation is "better". The hyperparameters of the algorithms vary across different implementations. Also, the reward metric is not strictly the same (e.g. Tianshou uses max average return in 10M steps but OpenAI Baselines only report average return at 10M steps, which is unfair). Lastly, Tianshou always uses 10 random seeds while others might use fewer. The comparison is here only to show Tianshou's reliability.