这是indexloc提供的服务,不要输入任何密码
Skip to content

Releases: thu-ml/tianshou

0.3.0.post1

08 Oct 15:24
Compare
Choose a tag to compare

Several bug fix (trainer, test and docs)

0.3.0

26 Sep 08:39
710966e
Compare
Choose a tag to compare

Since at this point, the code has largely changed from v0.2.0, we release version 0.3 from now on.

API Change

  1. add policy.updating and clarify collecting state and updating state in training (#224)
  2. change train_fn(epoch) to train_fn(epoch, env_step) and test_fn(epoch) to test_fn(epoch, env_step) (#229)
  3. remove out-of-the-date API: collector.sample, collector.render, collector.seed, VectorEnv (#210)

Bug Fix

  1. fix a bug in DDQN: target_q could not be sampled from np.random.rand (#224)
  2. fix a bug in DQN atari net: it should add a ReLU before the last layer (#224)
  3. fix a bug in collector timing (#224)
  4. fix a bug in the converter of Batch: deepcopy a Batch in to_numpy and to_torch (#213)
  5. ensure buffer.rew has a type of float (#229)

Enhancement

  1. Anaconda support: conda install -c conda-forge tianshou (#228)
  2. add PSRL (#202)
  3. add SAC discrete (#216)
  4. add type check in unit test (#200)
  5. format code and update function signatures (#213)
  6. add pydocstyle and doc8 check (#210)
  7. several documentation fix (#210)

0.3.0rc0

23 Sep 13:07
dcfcbb3
Compare
Choose a tag to compare
0.3.0rc0 Pre-release
Pre-release

This is a pre-release for testing anaconda.

0.2.7

08 Sep 13:38
64af7ea
Compare
Choose a tag to compare

API Change

  1. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
  2. add save_only_last_obs for replay buffer in order to save the memory. (#184)
  3. remove default value in batch.split() and add merge_last argument (#185)
  4. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard (#189)
  5. add max_batchsize in onpolicy algorithms (#189)
  6. keep only sumtree in segment tree implementation (#193)
  7. add __contains__ and pop in batch: key in batch, batch.pop(key, deft) (#189)
  8. remove dict return support for collector preprocess_fn (#189)
  9. remove **kwargs in ReplayBuffer (#189)
  10. add no_grad argument in collector.collect (#204)

Enhancement

  1. add DQN Atari examples (#187)
  2. change the type-checking order in batch.py and converter.py in order to meet the most often case first (#189)
  3. Numba acceleration for GAE, nstep, and segment tree (#193)
  4. add policy.eval() in all test scripts' "watch performance" (#189)
  5. add test_returns (both GAE and nstep) (#189)
  6. improve the code-coverage (from 90% to 95%) and remove the dead code (#189)
  7. polish examples/box2d/bipedal_hardcore_sac.py (#207)

Bug fix

  1. fix a bug in MAPolicy: buffer.rew = Batch() doesn't change buffer.rew (thanks mypy) (#207)
  2. set policy.eval() before collector.collect (#204) This is a bug
  3. fix shape inconsistency for torch.Tensor in replay buffer (#189)
  4. potential bugfix for subproc.wait (#189)
  5. fix RecurrentActorProb (#189)
  6. fix some incorrect type annotation (#189)
  7. fix a bug in tictactoe set_eps (#193)
  8. dirty fix for asyncVenv check_id test

0.2.6

19 Aug 07:21
a9f9940
Compare
Choose a tag to compare

API Change

  1. Replay buffer allows stack_num = 1 (#165)
  2. add policy.update to enable post process and remove collector.sample (#180)
  3. Remove collector.close and rename VectorEnv to DummyVectorEnv (#179)

Enhancement

  1. Enable async simulation for all vector envs (#179)
  2. Improve PER (#159): use segment tree and enable all Q-learning algorithms to use PER
  3. unify single-env and multi-env in collector (#157)
  4. Pickle compatible for replay buffer and improve buffer.get (#182): fix #84 and make buffer more efficient
  5. Add ShmemVectorEnv implementation (#174)
  6. Add Dueling DQN implementation (#170)
  7. Add profile workflow (#143)
  8. Add BipedalWalkerHardcore-v3 SAC example (#177) (about 1 hour it is well-trained)

Bug fix

  1. fix #162 of multi-dim action (#160)

Note: 0.3 is coming soon!

0.2.5

22 Jul 06:59
bd9c3c7
Compare
Choose a tag to compare

New feature

Multi-agent Reinforcement Learning: https://tianshou.readthedocs.io/en/latest/tutorials/tictactoe.html (#122)

Documentation

Add a tutorial of Batch class to standardized the behavior of Batch: https://tianshou.readthedocs.io/en/latest/tutorials/batch.html (#142)

Bugfix

  • Fix inconsistent shape in A2CPolicy and PPOPolicy. Please be careful when dealing with log_prob (#155)
  • Fix list of tensors inside Batch, e.g., Batch(a=[np.zeros(3), torch.zeros(3)]) (#147)
  • Fix buffer update when stack_num > 0 (#154)
  • Remove useless kwargs

0.2.4.post1

14 Jul 00:00
Compare
Choose a tag to compare

Several bug fix and enhancement:

  • remove deprecated API append (#126)
  • Batch.cat_ and Batch.stack_ is now working well with inconsistent keys (#130)
  • Batch.is_empty now correctly recognizes empty over empty Batch (#128)
  • reconstruct collector: remove multiple buffer case, change the internal data to Batch, and add reward_metric for MARL usage (#125)
  • add Batch.update to mimic dict.update (#128)

0.2.4

10 Jul 09:50
47e8e26
Compare
Choose a tag to compare

Algorithm Implementation

  1. n_step returns for all Q-learning based algorithms; (#51)
  2. Auto alpha tuning in SAC (#80)
  3. Reserve policy._state to support saving hidden states in replay buffer (#19)
  4. Add sample_avail argument in ReplayBuffer to sample only available index in RNN training mode (#19)

New Feature

  1. Batch.cat (#87), Batch.stack (#93), Batch.empty (#106, #110)
  2. Advanced slicing method of Batch (#106)
  3. Batch(kwargs, copy=True) will perform a deep copy (#110)
  4. Add random=True argument in collector.collect to perform sampling with random policy (#78)

API Change

  1. Batch.append -> Batch.cat
  2. Remove atari wrapper to examples, since it is not a key feature in tianshou (#124)
  3. Add some pre-defined nets in tianshou.utils.net. Since we only define API instead of a class, we do not present it in tianshou.net. (#123)

Docs

Add cheatsheet: https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html

0.2.3

01 Jun 01:50
Compare
Choose a tag to compare

Enhancement

  1. Multimodal obs (also support any type obs) (#38, #69)
  2. Batch over Batch
  3. preprocess_fn (#42)
  4. Type annotation
  5. batch.to_torch, batch.to_numpy
  6. pickle support for batch

Fixed Bugs

  1. SAC/PPO diag gaussian
  2. PPO orthogonal init
  3. DQN zero eps
  4. Fix type infer in replay buffer

0.2.2

26 Apr 07:25
Compare
Choose a tag to compare

Algorithm Implementation

  1. Generalized Advantage Estimation (GAE);
  2. Update PPO algorithm with arXiv:1811.02553 and arXiv:1912.09729;
  3. Vanilla Imitation Learning (BC & DA, with continuous/discrete action space);
  4. Prioritized DQN;
  5. RNN-style policy network;
  6. Fix SAC with torch==1.5.0

API change

  1. change __call__ to forward in policy;
  2. Add save_fn in trainer;
  3. Add __repr__ in tianshou.data, e.g. print(buffer)