这是indexloc提供的服务,不要输入任何密码
Skip to content

Releases: thu-ml/tianshou

0.4.6.post1

25 Feb 16:08
c248b4f
Compare
Choose a tag to compare

This release is to fix the conda pkg publish, support more gym version instead of only the newest one, and keep compatibility of internal API. See #536.

0.4.6

25 Feb 02:03
97df511
Compare
Choose a tag to compare

Bug Fix

  1. Fix casts to int by to_torch_as(...) calls in policies when using discrete actions (#521)

API Change

  1. Change venv internal API name of worker: send_action -> send, get_result -> recv (align with envpool) (#517)

New Features

  1. Add Intrinsic Curiosity Module (#503)
  2. Implement CQLPolicy and offline_cql example (#506)
  3. Pettingzoo environment support (#494)
  4. Enable venvs.reset() concurrent execution (#517)

Enhancement

  1. Remove reset_buffer() from reset method (#501)
  2. Add atari ppo example (#523, #529)
  3. Add VizDoom PPO example and results (#533)
  4. Upgrade gym version to >=0.21 (#534)
  5. Switch atari example to use EnvPool by default (#534)

Documentation

  1. Update dqn tutorial and add envpool to docs (#526)

0.4.5

28 Nov 15:14
3592f45
Compare
Choose a tag to compare

Bug Fix

  1. Fix tqdm issue (#481)
  2. Fix atari wrapper to be deterministic (#467)
  3. Add writer.flush() in TensorboardLogger to ensure real-time logging result (#485)

Enhancement

  1. Implements set_env_attr and get_env_attr for vector environments (#478)
  2. Implement BCQPolicy and offline_bcq example (#480)
  3. Enable test_collector=None in 3 trainers to turn off testing during training (#485)
  4. Fix an inconsistency in the implementation of Discrete CRR. Now it uses Critic class for its critic, following conventions in other actor-critic policies (#485)
  5. Update several offline policies to use ActorCritic class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic (#485)
  6. Move Atari offline RL examples to examples/offline and tests to test/offline (#485)

0.4.4

13 Oct 16:30
Compare
Choose a tag to compare

API Change

  1. add a new class DataParallelNet for multi-GPU training (#461)
  2. add ActorCritic for deterministic parameter grouping for share-head actor-critic network (#458)
  3. collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) (#459)
  4. rename WandBLogger -> WandbLogger (#441)

Bug Fix

  1. fix logging in atari examples (#444)

Enhancement

  1. save_fn() will be called at the beginning of trainer (#459)
  2. create a new page for logger (#463)
  3. add save_data and restore_data in wandb, allow more input arguments for wandb init, and integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py (#441)

0.4.3

02 Sep 21:20
fc251ab
Compare
Choose a tag to compare

Bug Fix

  1. fix a2c/ppo optimizer bug when sharing head (#428)
  2. fix ppo dual clip implementation (#435)

Enhancement

  1. add Rainbow (#386)
  2. add WandbLogger (#427)
  3. add env_id in preprocess_fn (#391)
  4. update README, add new chart and bibtex (#406)
  5. add Makefile, now you can use make commit-checks to automatically perform almost all checks (#432)
  6. add isort and yapf, apply to existing codebase (#432)
  7. add spelling check by using make spelling (#432)
  8. update contributing.rst (#432)

0.4.2

26 Jun 10:24
ebaca6f
Compare
Choose a tag to compare

Enhancement

  1. Add model-free dqn family: IQN (#371), FQF (#376)
  2. Add model-free on-policy algorithm: NPG (#344, #347), TRPO (#337, #340)
  3. Add offline-rl algorithm: CQL (#359), CRR (#367)
  4. Support deterministic evaluation for onpolicy algorithms (#354)
  5. Make trainer resumable (#350)
  6. Support different state size and fix exception in venv.__del__ (#352, #384)
  7. Add vizdoom example (#384)
  8. Add numerical analysis tool and interactive plot (#335, #341)

0.4.1

04 Apr 09:36
dd4a011
Compare
Choose a tag to compare

API Change

  1. Add observation normalization in BaseVectorEnv (norm_obs, obs_rms, update_obs_rms and RunningMeanStd) (#308)
  2. Add policy.map_action to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (#313)
  3. Add lr_scheduler in on-policy algorithms, typically for LambdaLR (#318)

Note

To adapt with this version, you should change the action_range=... to action_space=env.action_space in policy initialization.

Bug Fix

  1. Fix incorrect behaviors (error when n/ep==0 and reward shown in tqdm) with on-policy algorithm (#306, #328)
  2. Fix q-value mask_action error for obs_next (#310)

Enhancement

  1. Release SOTA Mujoco benchmark (DDPG/TD3/SAC: #305, REINFORCE: #320, A2C: #325, PPO: #330) and add corresponding notes in /examples/mujoco/README.md
  2. Fix numpy>=1.20 typing issue (#323)
  3. Add cross-platform unittest (#331)
  4. Add a test on how to deal with finite env (#324)
  5. Add value normalization in on-policy algorithms (#319, #321)
  6. Separate advantage normalization and value normalization in PPO (#329)

0.4.0

02 Mar 12:40
389bdb7
Compare
Choose a tag to compare

This release contains several API and behavior changes.

API Change

Buffer

  1. Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (#278, #280);
  2. Change buffer.add API from buffer.add(obs, act, rew, done, obs_next, info, policy, ...) to buffer.add(batch, buffer_ids) in order to add data more efficient (#280);
  3. Add set_batch method in buffer (#278);
  4. Add sample_index method, same as sample but only return index instead of both index and batch data (#278);
  5. Add prev (one-step previous transition index), next (one-step next transition index) and unfinished_index (the last modified index whose done==False) (#278);
  6. Add internal method _alloc_by_keys_diff in batch to support any form of keys pop up (#280);

Collector

  1. Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (#280);
  2. Drop collector.collect(n_episode=List[int]) because the new collector can collect episodes without bias (#280);
  3. Move reward_metric from Collector to trainer (#280);
  4. Change Collector.collect logic: AsyncCollector.collect's semantic is the same as previous version, where collect(n_step or n_episode) will not collect exact n_step or n_episode transitions; Collector.collect(n_step or n_episode)'s semantic now changes to exact n_step or n_episode collect (#280);

Policy

  1. Add policy.exploration_noise(action, batch) -> action method instead of implemented in policy.forward() (#280);
  2. Add Timelimit.truncate handler in compute_*_returns (#296);
  3. remove ignore_done flag (#296);
  4. remove reward_normalization option in offpolicy-algorithm (will raise Error if set to True) (#298);

Trainer

  1. Change collect_per_step to step_per_collect (#293);
  2. Add update_per_step and episode_per_collect (#293);
  3. onpolicy_trainer now supports either step_collect or episode_collect (#293)
  4. Add BasicLogger and LazyLogger to log data more conveniently (#295)

Bug Fix

  1. Fix VectorEnv action_space seed randomness -- when call env.seed(seed), it will call env.action_space.seed(seed); otherwise using Collector.collect(..., random=True) will produce different result each time (#300, #303).

0.3.2

16 Feb 01:41
cb65b56
Compare
Choose a tag to compare

Bug Fix

  1. fix networks under utils/discrete and utils/continuous cannot work well under CUDA+torch<=1.6.0 (#289)
  2. fix 2 bugs of Batch: creating keys in Batch.__setitem__ now throws ValueError instead of KeyError; _create_value now allows placeholder with stack=False option (#284)

Enhancement

  1. Add QR-DQN algorithm (#276)
  2. small optimization for Batch.cat and Batch.stack (#284), now it is almost as fast as v0.2.3

0.3.1

20 Jan 10:24
a511cb4
Compare
Choose a tag to compare

API Change

  1. change utils.network args to support any form of MLP by default (#275), remove layer_num and hidden_layer_size, add hidden_sizes (a list of int indicate the network architecture)
  2. add HDF5 save/load method for ReplayBuffer (#261)
  3. add offline_trainer (#263)
  4. move Atari-related network to examples/atari/atari_network.py (#275)

Bug Fix

  1. fix a potential bug in discrete behavior cloning policy (#263)

Enhancement

  1. update SAC mujoco result (#246)
  2. add C51 algorithm with benchmark result (#266)
  3. enable type checking in utils.network (#275)