thu-ml · Trinkle23897 · Jun 26, 2021 · Jun 25, 2021 · Jun 25, 2021 · Jun 25, 2021
diff --git a/.github/workflows/pytest.yml b/.github/workflows/pytest.yml
@@ -24,7 +24,7 @@ jobs:
     - name: Test with pytest
       # ignore test/throughput which only profiles the code
       run: |
-        pytest test --ignore-glob='*profile.py' --cov=tianshou --cov-report=xml --durations=0 -v
+        pytest test --ignore-glob='*profile.py' --cov=tianshou --cov-report=xml --cov-report=term-missing --durations=0 -v
     - name: Upload coverage to Codecov
       uses: codecov/codecov-action@v1
       with:

diff --git a/README.md b/README.md
@@ -6,15 +6,14 @@
 
 [![PyPI](https://img.shields.io/pypi/v/tianshou)](https://pypi.org/project/tianshou/)
 [![Conda](https://img.shields.io/conda/vn/conda-forge/tianshou)](https://github.com/conda-forge/tianshou-feedstock)
-[![Read the Docs](https://img.shields.io/readthedocs/tianshou)](https://tianshou.readthedocs.io/en/latest)
+[![Read the Docs](https://img.shields.io/readthedocs/tianshou)](https://tianshou.readthedocs.io/en/master)
 [![Read the Docs](https://img.shields.io/readthedocs/tianshou-docs-zh-cn?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://tianshou.readthedocs.io/zh/latest/)
 [![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg?branch=master)](https://github.com/thu-ml/tianshou/actions)
 [![codecov](https://img.shields.io/codecov/c/gh/thu-ml/tianshou)](https://codecov.io/gh/thu-ml/tianshou)
 [![GitHub issues](https://img.shields.io/github/issues/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/issues)
 [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers)
 [![GitHub forks](https://img.shields.io/github/forks/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/network)
 [![GitHub license](https://img.shields.io/github/license/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/blob/master/LICENSE)
-[![Gitter](https://badges.gitter.im/thu-ml/tianshou.svg)](https://gitter.im/thu-ml/tianshou?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
 
 **Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:
 

diff --git a/examples/vizdoom/.gitignore b/examples/vizdoom/.gitignore
@@ -0,0 +1 @@
+_vizdoom.ini
diff --git a/examples/vizdoom/README.md b/examples/vizdoom/README.md
@@ -0,0 +1,66 @@
+# ViZDoom
+
+[ViZDoom](https://github.com/mwydmuch/ViZDoom) is a popular RL env for a famous first-person shooting game Doom. Here we provide some results and intuitions for this scenario.
+
+## Train
+
+To train an agent:
+
+```bash
+python3 vizdoom_c51.py --task {D1_basic|D3_battle|D4_battle2}
+```
+
+D1 (health gathering) should finish training (no death) in less than 500k env step (5 epochs);
+
+D3 can reach 1600+ reward (75+ killcount in 5 minutes);
+
+D4 can reach 700+ reward. Here is the result:
+
+(episode length, the maximum length is 2625 because we use frameskip=4, that is 10500/4=2625)
+
+![](results/c51/length.png)
+
+(episode reward)
+
+![](results/c51/reward.png)
+
+To evaluate an agent's performance:
+
+```bash
+python3 vizdoom_c51.py --test-num 100 --resume-path policy.pth --watch --task {D1_basic|D3_battle|D4_battle2}
+```
+
+To save `.lmp` files for recording:
+
+```bash
+python3 vizdoom_c51.py --save-lmp --test-num 100 --resume-path policy.pth --watch --task {D1_basic|D3_battle|D4_battle2}
+```
+
+it will store `lmp` file in `lmps/` directory. To watch these `lmp` files (for example, d3 lmp):
+
+```bash
+python3 replay.py maps/D3_battle.cfg episode_8_25.lmp
+```
+
+We provide two lmp files (d3 best and d4 best) under `results/c51`, you can use the following command to enjoy:
+
+```bash
+python3 replay.py maps/D3_battle.cfg results/c51/d3.lmp
+python3 replay.py maps/D4_battle2.cfg results/c51/d4.lmp
+```
+
+## Maps
+
+See [maps/README.md](maps/README.md)
+
+## Algorithms
+
+The setting is exactly the same as Atari. You can definitely try more algorithms listed in Atari example.
+
+## Reward
+
+1. living reward is bad
+2. combo-action is really important
+3. negative reward for health and ammo2 is really helpful for d3/d4
+4. only with positive reward for health is really helpful for d1
+5. remove MOVE_BACKWARD may converge faster but the final performance may be lower
diff --git a/examples/vizdoom/env.py b/examples/vizdoom/env.py
@@ -0,0 +1,129 @@
+import os
+import cv2
+import gym
+import numpy as np
+import vizdoom as vzd
+
+
+def normal_button_comb():
+    actions = []
+    m_forward = [[0.0], [1.0]]
+    t_left_right = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0]]
+    for i in m_forward:
+        for j in t_left_right:
+            actions.append(i + j)
+    return actions
+
+
+def battle_button_comb():
+    actions = []
+    m_forward_backward = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0]]
+    m_left_right = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0]]
+    t_left_right = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0]]
+    attack = [[0.0], [1.0]]
+    speed = [[0.0], [1.0]]
+
+    for m in attack:
+        for n in speed:
+            for j in m_left_right:
+                for i in m_forward_backward:
+                    for k in t_left_right:
+                        actions.append(i + j + k + m + n)
+    return actions
+
+
+class Env(gym.Env):
+    def __init__(
+        self, cfg_path, frameskip=4, res=(4, 40, 60), save_lmp=False
+    ):
+        super().__init__()
+        self.save_lmp = save_lmp
+        self.health_setting = "battle" in cfg_path
+        if save_lmp:
+            os.makedirs("lmps", exist_ok=True)
+        self.res = res
+        self.skip = frameskip
+        self.observation_space = gym.spaces.Box(
+            low=0, high=255, shape=res, dtype=np.float32
+        )
+        self.game = vzd.DoomGame()
+        self.game.load_config(cfg_path)
+        self.game.init()
+        if "battle" in cfg_path:
+            self.available_actions = battle_button_comb()
+        else:
+            self.available_actions = normal_button_comb()
+        self.action_num = len(self.available_actions)
+        self.action_space = gym.spaces.Discrete(self.action_num)
+        self.spec = gym.envs.registration.EnvSpec("vizdoom-v0")
+        self.count = 0
+
+    def get_obs(self):
+        state = self.game.get_state()
+        if state is None:
+            return
+        obs = state.screen_buffer
+        self.obs_buffer[:-1] = self.obs_buffer[1:]
+        self.obs_buffer[-1] = cv2.resize(obs, (self.res[-1], self.res[-2]))
+
+    def reset(self):
+        if self.save_lmp:
+            self.game.new_episode(f"lmps/episode_{self.count}.lmp")
+        else:
+            self.game.new_episode()
+        self.count += 1
+        self.obs_buffer = np.zeros(self.res, dtype=np.uint8)
+        self.get_obs()
+        self.health = self.game.get_game_variable(vzd.GameVariable.HEALTH)
+        self.killcount = self.game.get_game_variable(
+            vzd.GameVariable.KILLCOUNT)
+        self.ammo2 = self.game.get_game_variable(vzd.GameVariable.AMMO2)
+        return self.obs_buffer
+
+    def step(self, action):
+        self.game.make_action(self.available_actions[action], self.skip)
+        reward = 0.0
+        self.get_obs()
+        health = self.game.get_game_variable(vzd.GameVariable.HEALTH)
+        if self.health_setting:
+            reward += health - self.health
+        elif health > self.health:  # positive health reward only for d1/d2
+            reward += health - self.health
+        self.health = health
+        killcount = self.game.get_game_variable(vzd.GameVariable.KILLCOUNT)
+        reward += 20 * (killcount - self.killcount)
+        self.killcount = killcount
+        ammo2 = self.game.get_game_variable(vzd.GameVariable.AMMO2)
+        # if ammo2 > self.ammo2:
+        reward += ammo2 - self.ammo2
+        self.ammo2 = ammo2
+        done = False
+        info = {}
+        if self.game.is_player_dead() or self.game.get_state() is None:
+            done = True
+        elif self.game.is_episode_finished():
+            done = True
+            info["TimeLimit.truncated"] = True
+        return self.obs_buffer, reward, done, info
+
+    def render(self):
+        pass
+
+    def close(self):
+        self.game.close()
+
+
+if __name__ == '__main__':
+    # env = Env("maps/D1_basic.cfg", 4, (4, 84, 84))
+    env = Env("maps/D3_battle.cfg", 4, (4, 84, 84))
+    print(env.available_actions)
+    action_num = env.action_space.n
+    obs = env.reset()
+    print(env.spec.reward_threshold)
+    print(obs.shape, action_num)
+    for i in range(4000):
+        obs, rew, done, info = env.step(0)
+        if done:
+            env.reset()
+    print(obs.shape, rew, done)
+    cv2.imwrite("test.png", obs.transpose(1, 2, 0)[..., :3])
diff --git a/examples/vizdoom/maps/D1_basic.cfg b/examples/vizdoom/maps/D1_basic.cfg
@@ -0,0 +1,39 @@
+# Lines starting with # are treated as comments (or with whitespaces+#).
+# It doesn't matter if you use capital letters or not.
+# It doesn't matter if you use underscore or camel notation for keys, e.g. episode_timeout is the same as episodeTimeout.
+
+doom_scenario_path = D1_basic.wad
+doom_map = map01
+
+# Rewards
+
+# Each step is good for you!
+living_reward = 0
+# And death is not!
+death_penalty = 0
+
+# Rendering options
+screen_resolution = RES_160X120
+screen_format = GRAY8
+render_hud = false
+render_crosshair = false
+render_weapon = false
+render_decals = false
+render_particles = false
+window_visible = false
+
+# make episodes finish after 10500 actions (tics)
+episode_timeout = 10500
+
+# Available buttons
+available_buttons =
+{
+    MOVE_FORWARD
+    TURN_LEFT
+    TURN_RIGHT
+}
+
+# Game variables that will be in the state
+available_game_variables = { HEALTH }
+
+mode = PLAYER
diff --git a/examples/vizdoom/maps/D1_basic.wad b/examples/vizdoom/maps/D1_basic.wad
diff --git a/examples/vizdoom/maps/D2_navigation.cfg b/examples/vizdoom/maps/D2_navigation.cfg
@@ -0,0 +1,39 @@
+# Lines starting with # are treated as comments (or with whitespaces+#).
+# It doesn't matter if you use capital letters or not.
+# It doesn't matter if you use underscore or camel notation for keys, e.g. episode_timeout is the same as episodeTimeout.
+
+doom_scenario_path = D2_navigation.wad
+doom_map = map01
+
+# Rewards
+
+# Each step is good for you!
+living_reward = 0
+# And death is not!
+death_penalty = 0
+
+# Rendering options
+screen_resolution = RES_160X120
+screen_format = GRAY8
+render_hud = false
+render_crosshair = false
+render_weapon = false
+render_decals = false
+render_particles = false
+window_visible = false
+
+# make episodes finish after 10500 actions (tics)
+episode_timeout = 10500
+
+# Available buttons
+available_buttons =
+{
+    MOVE_FORWARD
+    TURN_LEFT
+    TURN_RIGHT
+}
+
+# Game variables that will be in the state
+available_game_variables = { HEALTH }
+
+mode = PLAYER
diff --git a/examples/vizdoom/maps/D2_navigation.wad b/examples/vizdoom/maps/D2_navigation.wad
diff --git a/examples/vizdoom/maps/D3_battle.cfg b/examples/vizdoom/maps/D3_battle.cfg
@@ -0,0 +1,48 @@
+# Lines starting with # are treated as comments (or with whitespaces+#).
+# It doesn't matter if you use capital letters or not.
+# It doesn't matter if you use underscore or camel notation for keys, e.g. episode_timeout is the same as episodeTimeout.
+
+doom_scenario_path = D3_battle.wad
+doom_map = map01
+
+# Rewards
+
+living_reward = 0
+death_penalty = 100
+
+# Rendering options
+screen_resolution = RES_160X120
+screen_format = GRAY8
+render_hud = false
+render_crosshair = true
+render_weapon = true
+render_decals = false
+render_particles = false
+window_visible = false
+
+# make episodes finish after 10500 actions (tics)
+episode_timeout = 10500
+
+# Available buttons
+available_buttons =
+{
+    MOVE_FORWARD
+    MOVE_BACKWARD
+    MOVE_LEFT
+    MOVE_RIGHT
+    TURN_LEFT
+    TURN_RIGHT
+    ATTACK
+    SPEED
+}
+
+# Game variables that will be in the state
+available_game_variables =
+{
+    KILLCOUNT
+    AMMO2
+    HEALTH
+}
+
+mode = PLAYER
+doom_skill = 2
diff --git a/examples/vizdoom/maps/D3_battle.wad b/examples/vizdoom/maps/D3_battle.wad