这是indexloc提供的服务,不要输入任何密码
Skip to content

add vizdoom example, bump version to 0.4.2 #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
- name: Test with pytest
# ignore test/throughput which only profiles the code
run: |
pytest test --ignore-glob='*profile.py' --cov=tianshou --cov-report=xml --durations=0 -v
pytest test --ignore-glob='*profile.py' --cov=tianshou --cov-report=xml --cov-report=term-missing --durations=0 -v
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,14 @@

[![PyPI](https://img.shields.io/pypi/v/tianshou)](https://pypi.org/project/tianshou/)
[![Conda](https://img.shields.io/conda/vn/conda-forge/tianshou)](https://github.com/conda-forge/tianshou-feedstock)
[![Read the Docs](https://img.shields.io/readthedocs/tianshou)](https://tianshou.readthedocs.io/en/latest)
[![Read the Docs](https://img.shields.io/readthedocs/tianshou)](https://tianshou.readthedocs.io/en/master)
[![Read the Docs](https://img.shields.io/readthedocs/tianshou-docs-zh-cn?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://tianshou.readthedocs.io/zh/latest/)
[![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg?branch=master)](https://github.com/thu-ml/tianshou/actions)
[![codecov](https://img.shields.io/codecov/c/gh/thu-ml/tianshou)](https://codecov.io/gh/thu-ml/tianshou)
[![GitHub issues](https://img.shields.io/github/issues/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/issues)
[![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/network)
[![GitHub license](https://img.shields.io/github/license/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/blob/master/LICENSE)
[![Gitter](https://badges.gitter.im/thu-ml/tianshou.svg)](https://gitter.im/thu-ml/tianshou?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

**Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:

Expand Down
1 change: 1 addition & 0 deletions examples/vizdoom/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_vizdoom.ini
66 changes: 66 additions & 0 deletions examples/vizdoom/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# ViZDoom

[ViZDoom](https://github.com/mwydmuch/ViZDoom) is a popular RL env for a famous first-person shooting game Doom. Here we provide some results and intuitions for this scenario.

## Train

To train an agent:

```bash
python3 vizdoom_c51.py --task {D1_basic|D3_battle|D4_battle2}
```

D1 (health gathering) should finish training (no death) in less than 500k env step (5 epochs);

D3 can reach 1600+ reward (75+ killcount in 5 minutes);

D4 can reach 700+ reward. Here is the result:

(episode length, the maximum length is 2625 because we use frameskip=4, that is 10500/4=2625)

![](results/c51/length.png)

(episode reward)

![](results/c51/reward.png)

To evaluate an agent's performance:

```bash
python3 vizdoom_c51.py --test-num 100 --resume-path policy.pth --watch --task {D1_basic|D3_battle|D4_battle2}
```

To save `.lmp` files for recording:

```bash
python3 vizdoom_c51.py --save-lmp --test-num 100 --resume-path policy.pth --watch --task {D1_basic|D3_battle|D4_battle2}
```

it will store `lmp` file in `lmps/` directory. To watch these `lmp` files (for example, d3 lmp):

```bash
python3 replay.py maps/D3_battle.cfg episode_8_25.lmp
```

We provide two lmp files (d3 best and d4 best) under `results/c51`, you can use the following command to enjoy:

```bash
python3 replay.py maps/D3_battle.cfg results/c51/d3.lmp
python3 replay.py maps/D4_battle2.cfg results/c51/d4.lmp
```

## Maps

See [maps/README.md](maps/README.md)

## Algorithms

The setting is exactly the same as Atari. You can definitely try more algorithms listed in Atari example.

## Reward

1. living reward is bad
2. combo-action is really important
3. negative reward for health and ammo2 is really helpful for d3/d4
4. only with positive reward for health is really helpful for d1
5. remove MOVE_BACKWARD may converge faster but the final performance may be lower
129 changes: 129 additions & 0 deletions examples/vizdoom/env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
import os
import cv2
import gym
import numpy as np
import vizdoom as vzd


def normal_button_comb():
actions = []
m_forward = [[0.0], [1.0]]
t_left_right = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0]]
for i in m_forward:
for j in t_left_right:
actions.append(i + j)
return actions


def battle_button_comb():
actions = []
m_forward_backward = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0]]
m_left_right = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0]]
t_left_right = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0]]
attack = [[0.0], [1.0]]
speed = [[0.0], [1.0]]

for m in attack:
for n in speed:
for j in m_left_right:
for i in m_forward_backward:
for k in t_left_right:
actions.append(i + j + k + m + n)
return actions


class Env(gym.Env):
def __init__(
self, cfg_path, frameskip=4, res=(4, 40, 60), save_lmp=False
):
super().__init__()
self.save_lmp = save_lmp
self.health_setting = "battle" in cfg_path
if save_lmp:
os.makedirs("lmps", exist_ok=True)
self.res = res
self.skip = frameskip
self.observation_space = gym.spaces.Box(
low=0, high=255, shape=res, dtype=np.float32
)
self.game = vzd.DoomGame()
self.game.load_config(cfg_path)
self.game.init()
if "battle" in cfg_path:
self.available_actions = battle_button_comb()
else:
self.available_actions = normal_button_comb()
self.action_num = len(self.available_actions)
self.action_space = gym.spaces.Discrete(self.action_num)
self.spec = gym.envs.registration.EnvSpec("vizdoom-v0")
self.count = 0

def get_obs(self):
state = self.game.get_state()
if state is None:
return
obs = state.screen_buffer
self.obs_buffer[:-1] = self.obs_buffer[1:]
self.obs_buffer[-1] = cv2.resize(obs, (self.res[-1], self.res[-2]))

def reset(self):
if self.save_lmp:
self.game.new_episode(f"lmps/episode_{self.count}.lmp")
else:
self.game.new_episode()
self.count += 1
self.obs_buffer = np.zeros(self.res, dtype=np.uint8)
self.get_obs()
self.health = self.game.get_game_variable(vzd.GameVariable.HEALTH)
self.killcount = self.game.get_game_variable(
vzd.GameVariable.KILLCOUNT)
self.ammo2 = self.game.get_game_variable(vzd.GameVariable.AMMO2)
return self.obs_buffer

def step(self, action):
self.game.make_action(self.available_actions[action], self.skip)
reward = 0.0
self.get_obs()
health = self.game.get_game_variable(vzd.GameVariable.HEALTH)
if self.health_setting:
reward += health - self.health
elif health > self.health: # positive health reward only for d1/d2
reward += health - self.health
self.health = health
killcount = self.game.get_game_variable(vzd.GameVariable.KILLCOUNT)
reward += 20 * (killcount - self.killcount)
self.killcount = killcount
ammo2 = self.game.get_game_variable(vzd.GameVariable.AMMO2)
# if ammo2 > self.ammo2:
reward += ammo2 - self.ammo2
self.ammo2 = ammo2
done = False
info = {}
if self.game.is_player_dead() or self.game.get_state() is None:
done = True
elif self.game.is_episode_finished():
done = True
info["TimeLimit.truncated"] = True
return self.obs_buffer, reward, done, info

def render(self):
pass

def close(self):
self.game.close()


if __name__ == '__main__':
# env = Env("maps/D1_basic.cfg", 4, (4, 84, 84))
env = Env("maps/D3_battle.cfg", 4, (4, 84, 84))
print(env.available_actions)
action_num = env.action_space.n
obs = env.reset()
print(env.spec.reward_threshold)
print(obs.shape, action_num)
for i in range(4000):
obs, rew, done, info = env.step(0)
if done:
env.reset()
print(obs.shape, rew, done)
cv2.imwrite("test.png", obs.transpose(1, 2, 0)[..., :3])
39 changes: 39 additions & 0 deletions examples/vizdoom/maps/D1_basic.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Lines starting with # are treated as comments (or with whitespaces+#).
# It doesn't matter if you use capital letters or not.
# It doesn't matter if you use underscore or camel notation for keys, e.g. episode_timeout is the same as episodeTimeout.

doom_scenario_path = D1_basic.wad
doom_map = map01

# Rewards

# Each step is good for you!
living_reward = 0
# And death is not!
death_penalty = 0

# Rendering options
screen_resolution = RES_160X120
screen_format = GRAY8
render_hud = false
render_crosshair = false
render_weapon = false
render_decals = false
render_particles = false
window_visible = false

# make episodes finish after 10500 actions (tics)
episode_timeout = 10500

# Available buttons
available_buttons =
{
MOVE_FORWARD
TURN_LEFT
TURN_RIGHT
}

# Game variables that will be in the state
available_game_variables = { HEALTH }

mode = PLAYER
Binary file added examples/vizdoom/maps/D1_basic.wad
Binary file not shown.
39 changes: 39 additions & 0 deletions examples/vizdoom/maps/D2_navigation.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Lines starting with # are treated as comments (or with whitespaces+#).
# It doesn't matter if you use capital letters or not.
# It doesn't matter if you use underscore or camel notation for keys, e.g. episode_timeout is the same as episodeTimeout.

doom_scenario_path = D2_navigation.wad
doom_map = map01

# Rewards

# Each step is good for you!
living_reward = 0
# And death is not!
death_penalty = 0

# Rendering options
screen_resolution = RES_160X120
screen_format = GRAY8
render_hud = false
render_crosshair = false
render_weapon = false
render_decals = false
render_particles = false
window_visible = false

# make episodes finish after 10500 actions (tics)
episode_timeout = 10500

# Available buttons
available_buttons =
{
MOVE_FORWARD
TURN_LEFT
TURN_RIGHT
}

# Game variables that will be in the state
available_game_variables = { HEALTH }

mode = PLAYER
Binary file added examples/vizdoom/maps/D2_navigation.wad
Binary file not shown.
48 changes: 48 additions & 0 deletions examples/vizdoom/maps/D3_battle.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Lines starting with # are treated as comments (or with whitespaces+#).
# It doesn't matter if you use capital letters or not.
# It doesn't matter if you use underscore or camel notation for keys, e.g. episode_timeout is the same as episodeTimeout.

doom_scenario_path = D3_battle.wad
doom_map = map01

# Rewards

living_reward = 0
death_penalty = 100

# Rendering options
screen_resolution = RES_160X120
screen_format = GRAY8
render_hud = false
render_crosshair = true
render_weapon = true
render_decals = false
render_particles = false
window_visible = false

# make episodes finish after 10500 actions (tics)
episode_timeout = 10500

# Available buttons
available_buttons =
{
MOVE_FORWARD
MOVE_BACKWARD
MOVE_LEFT
MOVE_RIGHT
TURN_LEFT
TURN_RIGHT
ATTACK
SPEED
}

# Game variables that will be in the state
available_game_variables =
{
KILLCOUNT
AMMO2
HEALTH
}

mode = PLAYER
doom_skill = 2
Binary file added examples/vizdoom/maps/D3_battle.wad
Binary file not shown.
Loading