这是indexloc提供的服务,不要输入任何密码
Skip to content

merge dev to master #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 2, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,13 +158,14 @@ Currently, the overall code of Tianshou platform is less than 2500 lines. Most o
```python
result = collector.collect(n_step=n)
```

If you have 3 environments in total and want to collect 1 episode in the first environment, 3 for the third environment:
If you have 3 environments in total and want to collect 4 episodes:

```python
result = collector.collect(n_episode=[1, 0, 3])
result = collector.collect(n_episode=4)
```

Collector will collect exactly 4 episodes without any bias of episode length despite we only have 3 parallel environments.

If you want to train the given policy with a sampled batch:

```python
Expand All @@ -190,12 +191,13 @@ Define some hyper-parameters:
```python
task = 'CartPole-v0'
lr, epoch, batch_size = 1e-3, 10, 64
train_num, test_num = 8, 100
train_num, test_num = 10, 100
gamma, n_step, target_freq = 0.9, 3, 320
buffer_size = 20000
eps_train, eps_test = 0.1, 0.05
step_per_epoch, collect_per_step = 1000, 10
step_per_epoch, step_per_collect = 10000, 10
writer = SummaryWriter('log/dqn') # tensorboard is also supported!
logger = ts.utils.BasicLogger(writer)
```

Make environments:
Expand Down Expand Up @@ -223,20 +225,20 @@ Setup policy and collectors:

```python
policy = ts.policy.DQNPolicy(net, optim, gamma, n_step, target_update_freq=target_freq)
train_collector = ts.data.Collector(policy, train_envs, ts.data.ReplayBuffer(buffer_size))
test_collector = ts.data.Collector(policy, test_envs)
train_collector = ts.data.Collector(policy, train_envs, ts.data.VectorReplayBuffer(buffer_size, train_num), exploration_noise=True)
test_collector = ts.data.Collector(policy, test_envs, exploration_noise=True) # because DQN uses epsilon-greedy method
```

Let's train it:

```python
result = ts.trainer.offpolicy_trainer(
policy, train_collector, test_collector, epoch, step_per_epoch, collect_per_step,
test_num, batch_size,
policy, train_collector, test_collector, epoch, step_per_epoch, step_per_collect,
test_num, batch_size, update_per_step=1 / step_per_collect,
train_fn=lambda epoch, env_step: policy.set_eps(eps_train),
test_fn=lambda epoch, env_step: policy.set_eps(eps_test),
stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold,
writer=writer)
logger=logger)
print(f'Finished training! Use {result["duration"]}')
```

Expand All @@ -252,7 +254,7 @@ Watch the performance with 35 FPS:
```python
policy.eval()
policy.set_eps(eps_test)
collector = ts.data.Collector(policy, env)
collector = ts.data.Collector(policy, env, exploration_noise=True)
collector.collect(n_episode=1, render=1 / 35)
```

Expand Down
85 changes: 84 additions & 1 deletion docs/api/tianshou.data.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,90 @@
tianshou.data
=============

.. automodule:: tianshou.data

Batch
-----

.. autoclass:: tianshou.data.Batch
:members:
:undoc-members:
:show-inheritance:


Buffer
------

ReplayBuffer
~~~~~~~~~~~~

.. autoclass:: tianshou.data.ReplayBuffer
:members:
:undoc-members:
:show-inheritance:

PrioritizedReplayBuffer
~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: tianshou.data.PrioritizedReplayBuffer
:members:
:undoc-members:
:show-inheritance:

ReplayBufferManager
~~~~~~~~~~~~~~~~~~~

.. autoclass:: tianshou.data.ReplayBufferManager
:members:
:undoc-members:
:show-inheritance:

PrioritizedReplayBufferManager
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: tianshou.data.PrioritizedReplayBufferManager
:members:
:undoc-members:
:show-inheritance:

VectorReplayBuffer
~~~~~~~~~~~~~~~~~~

.. autoclass:: tianshou.data.VectorReplayBuffer
:members:
:undoc-members:
:show-inheritance:

PrioritizedVectorReplayBuffer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: tianshou.data.PrioritizedVectorReplayBuffer
:members:
:undoc-members:
:show-inheritance:

CachedReplayBuffer
~~~~~~~~~~~~~~~~~~

.. autoclass:: tianshou.data.CachedReplayBuffer
:members:
:undoc-members:
:show-inheritance:

Collector
---------

Collector
~~~~~~~~~

.. autoclass:: tianshou.data.Collector
:members:
:undoc-members:
:show-inheritance:

AsyncCollector
~~~~~~~~~~~~~~

.. autoclass:: tianshou.data.AsyncCollector
:members:
:undoc-members:
:show-inheritance:
74 changes: 72 additions & 2 deletions docs/api/tianshou.env.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,82 @@
tianshou.env
============

.. automodule:: tianshou.env

VectorEnv
---------

BaseVectorEnv
~~~~~~~~~~~~~

.. autoclass:: tianshou.env.BaseVectorEnv
:members:
:undoc-members:
:show-inheritance:

.. automodule:: tianshou.env.worker
DummyVectorEnv
~~~~~~~~~~~~~~

.. autoclass:: tianshou.env.DummyVectorEnv
:members:
:undoc-members:
:show-inheritance:

SubprocVectorEnv
~~~~~~~~~~~~~~~~

.. autoclass:: tianshou.env.SubprocVectorEnv
:members:
:undoc-members:
:show-inheritance:

ShmemVectorEnv
~~~~~~~~~~~~~~

.. autoclass:: tianshou.env.ShmemVectorEnv
:members:
:undoc-members:
:show-inheritance:

RayVectorEnv
~~~~~~~~~~~~

.. autoclass:: tianshou.env.RayVectorEnv
:members:
:undoc-members:
:show-inheritance:


Worker
------

EnvWorker
~~~~~~~~~

.. autoclass:: tianshou.env.worker.EnvWorker
:members:
:undoc-members:
:show-inheritance:

DummyEnvWorker
~~~~~~~~~~~~~~

.. autoclass:: tianshou.env.worker.DummyEnvWorker
:members:
:undoc-members:
:show-inheritance:

SubprocEnvWorker
~~~~~~~~~~~~~~~~

.. autoclass:: tianshou.env.worker.SubprocEnvWorker
:members:
:undoc-members:
:show-inheritance:

RayEnvWorker
~~~~~~~~~~~~

.. autoclass:: tianshou.env.worker.RayEnvWorker
:members:
:undoc-members:
:show-inheritance:
101 changes: 100 additions & 1 deletion docs/api/tianshou.policy.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,106 @@
tianshou.policy
===============

.. automodule:: tianshou.policy
Base
----

.. autoclass:: tianshou.policy.BasePolicy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.RandomPolicy
:members:
:undoc-members:
:show-inheritance:

Model-free
----------

DQN Family
~~~~~~~~~~

.. autoclass:: tianshou.policy.DQNPolicy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.C51Policy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.QRDQNPolicy
:members:
:undoc-members:
:show-inheritance:

On-policy
~~~~~~~~~

.. autoclass:: tianshou.policy.PGPolicy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.A2CPolicy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.PPOPolicy
:members:
:undoc-members:
:show-inheritance:

Off-policy
~~~~~~~~~~

.. autoclass:: tianshou.policy.DDPGPolicy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.TD3Policy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.SACPolicy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.DiscreteSACPolicy
:members:
:undoc-members:
:show-inheritance:

Imitation
---------

.. autoclass:: tianshou.policy.ImitationPolicy
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: tianshou.policy.DiscreteBCQPolicy
:members:
:undoc-members:
:show-inheritance:

Model-based
-----------

.. autoclass:: tianshou.policy.PSRLPolicy
:members:
:undoc-members:
:show-inheritance:

Multi-agent
-----------

.. autoclass:: tianshou.policy.MultiAgentPolicyManager
:members:
:undoc-members:
:show-inheritance:
13 changes: 13 additions & 0 deletions docs/api/tianshou.utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,29 @@ tianshou.utils
:undoc-members:
:show-inheritance:


Pre-defined Networks
--------------------

Common
~~~~~~

.. automodule:: tianshou.utils.net.common
:members:
:undoc-members:
:show-inheritance:

Discrete
~~~~~~~~

.. automodule:: tianshou.utils.net.discrete
:members:
:undoc-members:
:show-inheritance:

Continuous
~~~~~~~~~~

.. automodule:: tianshou.utils.net.continuous
:members:
:undoc-members:
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
]
)
}
autodoc_member_order = "bysource"
bibtex_bibfiles = ['refs.bib']

# -- Options for HTML output -------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions docs/contributor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ We always welcome contributions to help make Tianshou better. Below are an incom
* Minghao Zhang (`Mehooz <https://github.com/Mehooz>`_)
* Alexis Duburcq (`duburcqa <https://github.com/duburcqa>`_)
* Kaichao You (`youkaichao <https://github.com/youkaichao>`_)
* Huayu Chen (`ChenDRAG <https://github.com/ChenDRAG>`_)
Loading