这是indexloc提供的服务,不要输入任何密码
Skip to content

Releases: thu-ml/tianshou

0.3.1

20 Jan 10:24
a511cb4

Choose a tag to compare

API Change

  1. change utils.network args to support any form of MLP by default (#275), remove layer_num and hidden_layer_size, add hidden_sizes (a list of int indicate the network architecture)
  2. add HDF5 save/load method for ReplayBuffer (#261)
  3. add offline_trainer (#263)
  4. move Atari-related network to examples/atari/atari_network.py (#275)

Bug Fix

  1. fix a potential bug in discrete behavior cloning policy (#263)

Enhancement

  1. update SAC mujoco result (#246)
  2. add C51 algorithm with benchmark result (#266)
  3. enable type checking in utils.network (#275)

0.3.0.post1

08 Oct 15:24

Choose a tag to compare

Several bug fix (trainer, test and docs)

0.3.0

26 Sep 08:39
710966e

Choose a tag to compare

Since at this point, the code has largely changed from v0.2.0, we release version 0.3 from now on.

API Change

  1. add policy.updating and clarify collecting state and updating state in training (#224)
  2. change train_fn(epoch) to train_fn(epoch, env_step) and test_fn(epoch) to test_fn(epoch, env_step) (#229)
  3. remove out-of-the-date API: collector.sample, collector.render, collector.seed, VectorEnv (#210)

Bug Fix

  1. fix a bug in DDQN: target_q could not be sampled from np.random.rand (#224)
  2. fix a bug in DQN atari net: it should add a ReLU before the last layer (#224)
  3. fix a bug in collector timing (#224)
  4. fix a bug in the converter of Batch: deepcopy a Batch in to_numpy and to_torch (#213)
  5. ensure buffer.rew has a type of float (#229)

Enhancement

  1. Anaconda support: conda install -c conda-forge tianshou (#228)
  2. add PSRL (#202)
  3. add SAC discrete (#216)
  4. add type check in unit test (#200)
  5. format code and update function signatures (#213)
  6. add pydocstyle and doc8 check (#210)
  7. several documentation fix (#210)

0.3.0rc0

23 Sep 13:07
dcfcbb3

Choose a tag to compare

0.3.0rc0 Pre-release
Pre-release

This is a pre-release for testing anaconda.

0.2.7

08 Sep 13:38
64af7ea

Choose a tag to compare

API Change

  1. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
  2. add save_only_last_obs for replay buffer in order to save the memory. (#184)
  3. remove default value in batch.split() and add merge_last argument (#185)
  4. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard (#189)
  5. add max_batchsize in onpolicy algorithms (#189)
  6. keep only sumtree in segment tree implementation (#193)
  7. add __contains__ and pop in batch: key in batch, batch.pop(key, deft) (#189)
  8. remove dict return support for collector preprocess_fn (#189)
  9. remove **kwargs in ReplayBuffer (#189)
  10. add no_grad argument in collector.collect (#204)

Enhancement

  1. add DQN Atari examples (#187)
  2. change the type-checking order in batch.py and converter.py in order to meet the most often case first (#189)
  3. Numba acceleration for GAE, nstep, and segment tree (#193)
  4. add policy.eval() in all test scripts' "watch performance" (#189)
  5. add test_returns (both GAE and nstep) (#189)
  6. improve the code-coverage (from 90% to 95%) and remove the dead code (#189)
  7. polish examples/box2d/bipedal_hardcore_sac.py (#207)

Bug fix

  1. fix a bug in MAPolicy: buffer.rew = Batch() doesn't change buffer.rew (thanks mypy) (#207)
  2. set policy.eval() before collector.collect (#204) This is a bug
  3. fix shape inconsistency for torch.Tensor in replay buffer (#189)
  4. potential bugfix for subproc.wait (#189)
  5. fix RecurrentActorProb (#189)
  6. fix some incorrect type annotation (#189)
  7. fix a bug in tictactoe set_eps (#193)
  8. dirty fix for asyncVenv check_id test

0.2.6

19 Aug 07:21
a9f9940

Choose a tag to compare

API Change

  1. Replay buffer allows stack_num = 1 (#165)
  2. add policy.update to enable post process and remove collector.sample (#180)
  3. Remove collector.close and rename VectorEnv to DummyVectorEnv (#179)

Enhancement

  1. Enable async simulation for all vector envs (#179)
  2. Improve PER (#159): use segment tree and enable all Q-learning algorithms to use PER
  3. unify single-env and multi-env in collector (#157)
  4. Pickle compatible for replay buffer and improve buffer.get (#182): fix #84 and make buffer more efficient
  5. Add ShmemVectorEnv implementation (#174)
  6. Add Dueling DQN implementation (#170)
  7. Add profile workflow (#143)
  8. Add BipedalWalkerHardcore-v3 SAC example (#177) (about 1 hour it is well-trained)

Bug fix

  1. fix #162 of multi-dim action (#160)

Note: 0.3 is coming soon!

0.2.5

22 Jul 06:59
bd9c3c7

Choose a tag to compare

New feature

Multi-agent Reinforcement Learning: https://tianshou.readthedocs.io/en/latest/tutorials/tictactoe.html (#122)

Documentation

Add a tutorial of Batch class to standardized the behavior of Batch: https://tianshou.readthedocs.io/en/latest/tutorials/batch.html (#142)

Bugfix

  • Fix inconsistent shape in A2CPolicy and PPOPolicy. Please be careful when dealing with log_prob (#155)
  • Fix list of tensors inside Batch, e.g., Batch(a=[np.zeros(3), torch.zeros(3)]) (#147)
  • Fix buffer update when stack_num > 0 (#154)
  • Remove useless kwargs

0.2.4.post1

14 Jul 00:00

Choose a tag to compare

Several bug fix and enhancement:

  • remove deprecated API append (#126)
  • Batch.cat_ and Batch.stack_ is now working well with inconsistent keys (#130)
  • Batch.is_empty now correctly recognizes empty over empty Batch (#128)
  • reconstruct collector: remove multiple buffer case, change the internal data to Batch, and add reward_metric for MARL usage (#125)
  • add Batch.update to mimic dict.update (#128)

0.2.4

10 Jul 09:50
47e8e26

Choose a tag to compare

Algorithm Implementation

  1. n_step returns for all Q-learning based algorithms; (#51)
  2. Auto alpha tuning in SAC (#80)
  3. Reserve policy._state to support saving hidden states in replay buffer (#19)
  4. Add sample_avail argument in ReplayBuffer to sample only available index in RNN training mode (#19)

New Feature

  1. Batch.cat (#87), Batch.stack (#93), Batch.empty (#106, #110)
  2. Advanced slicing method of Batch (#106)
  3. Batch(kwargs, copy=True) will perform a deep copy (#110)
  4. Add random=True argument in collector.collect to perform sampling with random policy (#78)

API Change

  1. Batch.append -> Batch.cat
  2. Remove atari wrapper to examples, since it is not a key feature in tianshou (#124)
  3. Add some pre-defined nets in tianshou.utils.net. Since we only define API instead of a class, we do not present it in tianshou.net. (#123)

Docs

Add cheatsheet: https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html

0.2.3

01 Jun 01:50

Choose a tag to compare

Enhancement

  1. Multimodal obs (also support any type obs) (#38, #69)
  2. Batch over Batch
  3. preprocess_fn (#42)
  4. Type annotation
  5. batch.to_torch, batch.to_numpy
  6. pickle support for batch

Fixed Bugs

  1. SAC/PPO diag gaussian
  2. PPO orthogonal init
  3. DQN zero eps
  4. Fix type infer in replay buffer