API Change

change utils.network args to support any form of MLP by default (#275), remove layer_num and hidden_layer_size, add hidden_sizes (a list of int indicate the network architecture)
add HDF5 save/load method for ReplayBuffer (#261)
add offline_trainer (#263)
move Atari-related network to examples/atari/atari_network.py (#275)

Bug Fix

fix a potential bug in discrete behavior cloning policy (#263)

Enhancement

update SAC mujoco result (#246)
add C51 algorithm with benchmark result (#266)
enable type checking in utils.network (#275)

Several bug fix (trainer, test and docs)

Since at this point, the code has largely changed from v0.2.0, we release version 0.3 from now on.

API Change

add policy.updating and clarify collecting state and updating state in training (#224)
change train_fn(epoch) to train_fn(epoch, env_step) and test_fn(epoch) to test_fn(epoch, env_step) (#229)
remove out-of-the-date API: collector.sample, collector.render, collector.seed, VectorEnv (#210)

Bug Fix

fix a bug in DDQN: target_q could not be sampled from np.random.rand (#224)
fix a bug in DQN atari net: it should add a ReLU before the last layer (#224)
fix a bug in collector timing (#224)
fix a bug in the converter of Batch: deepcopy a Batch in to_numpy and to_torch (#213)
ensure buffer.rew has a type of float (#229)

Enhancement

Anaconda support: conda install -c conda-forge tianshou (#228)
add PSRL (#202)
add SAC discrete (#216)
add type check in unit test (#200)
format code and update function signatures (#213)
add pydocstyle and doc8 check (#210)
several documentation fix (#210)

This is a pre-release for testing anaconda.

API Change

exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
add save_only_last_obs for replay buffer in order to save the memory. (#184)
remove default value in batch.split() and add merge_last argument (#185)
fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard (#189)
add max_batchsize in onpolicy algorithms (#189)
keep only sumtree in segment tree implementation (#193)
add __contains__ and pop in batch: key in batch, batch.pop(key, deft) (#189)
remove dict return support for collector preprocess_fn (#189)
remove **kwargs in ReplayBuffer (#189)
add no_grad argument in collector.collect (#204)

Enhancement

add DQN Atari examples (#187)
change the type-checking order in batch.py and converter.py in order to meet the most often case first (#189)
Numba acceleration for GAE, nstep, and segment tree (#193)
add policy.eval() in all test scripts' "watch performance" (#189)
add test_returns (both GAE and nstep) (#189)
improve the code-coverage (from 90% to 95%) and remove the dead code (#189)
polish examples/box2d/bipedal_hardcore_sac.py (#207)

Bug fix

fix a bug in MAPolicy: buffer.rew = Batch() doesn't change buffer.rew (thanks mypy) (#207)
~~set policy.eval() before collector.collect (#204)~~ This is a bug
fix shape inconsistency for torch.Tensor in replay buffer (#189)
potential bugfix for subproc.wait (#189)
fix RecurrentActorProb (#189)
fix some incorrect type annotation (#189)
fix a bug in tictactoe set_eps (#193)
dirty fix for asyncVenv check_id test

API Change

Replay buffer allows stack_num = 1 (#165)
add policy.update to enable post process and remove collector.sample (#180)
Remove collector.close and rename VectorEnv to DummyVectorEnv (#179)

Enhancement

Enable async simulation for all vector envs (#179)
Improve PER (#159): use segment tree and enable all Q-learning algorithms to use PER
unify single-env and multi-env in collector (#157)
Pickle compatible for replay buffer and improve buffer.get (#182): fix #84 and make buffer more efficient
Add ShmemVectorEnv implementation (#174)
Add Dueling DQN implementation (#170)
Add profile workflow (#143)
Add BipedalWalkerHardcore-v3 SAC example (#177) (about 1 hour it is well-trained)

Bug fix

fix #162 of multi-dim action (#160)

Note: 0.3 is coming soon!

New feature

Multi-agent Reinforcement Learning: https://tianshou.readthedocs.io/en/latest/tutorials/tictactoe.html (#122)

Documentation

Add a tutorial of Batch class to standardized the behavior of Batch: https://tianshou.readthedocs.io/en/latest/tutorials/batch.html (#142)

Bugfix

Fix inconsistent shape in A2CPolicy and PPOPolicy. Please be careful when dealing with log_prob (#155)
Fix list of tensors inside Batch, e.g., Batch(a=[np.zeros(3), torch.zeros(3)]) (#147)
Fix buffer update when stack_num > 0 (#154)
Remove useless kwargs

Several bug fix and enhancement:

remove deprecated API append (#126)
Batch.cat_ and Batch.stack_ is now working well with inconsistent keys (#130)
Batch.is_empty now correctly recognizes empty over empty Batch (#128)
reconstruct collector: remove multiple buffer case, change the internal data to Batch, and add reward_metric for MARL usage (#125)
add Batch.update to mimic dict.update (#128)

Algorithm Implementation

n_step returns for all Q-learning based algorithms; (#51)
Auto alpha tuning in SAC (#80)
Reserve policy._state to support saving hidden states in replay buffer (#19)
Add sample_avail argument in ReplayBuffer to sample only available index in RNN training mode (#19)

New Feature

Batch.cat (#87), Batch.stack (#93), Batch.empty (#106, #110)
Advanced slicing method of Batch (#106)
Batch(kwargs, copy=True) will perform a deep copy (#110)
Add random=True argument in collector.collect to perform sampling with random policy (#78)

API Change

Batch.append -> Batch.cat
Remove atari wrapper to examples, since it is not a key feature in tianshou (#124)
Add some pre-defined nets in tianshou.utils.net. Since we only define API instead of a class, we do not present it in tianshou.net. (#123)

Docs

Add cheatsheet: https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html

Enhancement

Multimodal obs (also support any type obs) (#38, #69)
Batch over Batch
preprocess_fn (#42)
Type annotation
batch.to_torch, batch.to_numpy
pickle support for batch

Fixed Bugs

SAC/PPO diag gaussian
PPO orthogonal init
DQN zero eps
Fix type infer in replay buffer

Releases: thu-ml/tianshou

0.3.1

API Change

Bug Fix

Enhancement

Uh oh!

0.3.0.post1

Uh oh!

0.3.0

API Change

Bug Fix

Enhancement

Uh oh!

0.3.0rc0

Uh oh!

0.2.7

API Change

Enhancement

Bug fix

Uh oh!

0.2.6

API Change

Enhancement

Bug fix

Uh oh!

0.2.5

New feature

Documentation

Bugfix

Uh oh!

0.2.4.post1

Uh oh!

0.2.4

Algorithm Implementation

New Feature

API Change

Docs

Uh oh!

0.2.3

Enhancement

Fixed Bugs

Uh oh!