Releases: thu-ml/tianshou
Releases · thu-ml/tianshou
0.4.6.post1
This release is to fix the conda pkg publish, support more gym version instead of only the newest one, and keep compatibility of internal API. See #536.
0.4.6
Bug Fix
- Fix casts to int by to_torch_as(...) calls in policies when using discrete actions (#521)
API Change
- Change venv internal API name of worker: send_action -> send, get_result -> recv (align with envpool) (#517)
New Features
- Add Intrinsic Curiosity Module (#503)
- Implement CQLPolicy and offline_cql example (#506)
- Pettingzoo environment support (#494)
- Enable venvs.reset() concurrent execution (#517)
Enhancement
- Remove reset_buffer() from reset method (#501)
- Add atari ppo example (#523, #529)
- Add VizDoom PPO example and results (#533)
- Upgrade gym version to >=0.21 (#534)
- Switch atari example to use EnvPool by default (#534)
Documentation
- Update dqn tutorial and add envpool to docs (#526)
0.4.5
Bug Fix
- Fix tqdm issue (#481)
- Fix atari wrapper to be deterministic (#467)
- Add
writer.flush()
in TensorboardLogger to ensure real-time logging result (#485)
Enhancement
- Implements set_env_attr and get_env_attr for vector environments (#478)
- Implement BCQPolicy and offline_bcq example (#480)
- Enable
test_collector=None
in 3 trainers to turn off testing during training (#485) - Fix an inconsistency in the implementation of Discrete CRR. Now it uses
Critic
class for its critic, following conventions in other actor-critic policies (#485) - Update several offline policies to use
ActorCritic
class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic (#485) - Move Atari offline RL examples to
examples/offline
and tests totest/offline
(#485)
0.4.4
API Change
- add a new class DataParallelNet for multi-GPU training (#461)
- add ActorCritic for deterministic parameter grouping for share-head actor-critic network (#458)
- collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) (#459)
- rename WandBLogger -> WandbLogger (#441)
Bug Fix
- fix logging in atari examples (#444)
Enhancement
0.4.3
Bug Fix
Enhancement
- add Rainbow (#386)
- add WandbLogger (#427)
- add env_id in preprocess_fn (#391)
- update README, add new chart and bibtex (#406)
- add Makefile, now you can use
make commit-checks
to automatically perform almost all checks (#432) - add isort and yapf, apply to existing codebase (#432)
- add spelling check by using
make spelling
(#432) - update contributing.rst (#432)
0.4.2
Enhancement
- Add model-free dqn family: IQN (#371), FQF (#376)
- Add model-free on-policy algorithm: NPG (#344, #347), TRPO (#337, #340)
- Add offline-rl algorithm: CQL (#359), CRR (#367)
- Support deterministic evaluation for onpolicy algorithms (#354)
- Make trainer resumable (#350)
- Support different state size and fix exception in venv.__del__ (#352, #384)
- Add vizdoom example (#384)
- Add numerical analysis tool and interactive plot (#335, #341)
0.4.1
API Change
- Add observation normalization in BaseVectorEnv (
norm_obs
,obs_rms
,update_obs_rms
andRunningMeanStd
) (#308) - Add
policy.map_action
to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (#313) - Add
lr_scheduler
in on-policy algorithms, typically forLambdaLR
(#318)
Note
To adapt with this version, you should change the action_range=...
to action_space=env.action_space
in policy initialization.
Bug Fix
- Fix incorrect behaviors (error when
n/ep==0
and reward shown in tqdm) with on-policy algorithm (#306, #328) - Fix q-value mask_action error for obs_next (#310)
Enhancement
- Release SOTA Mujoco benchmark (DDPG/TD3/SAC: #305, REINFORCE: #320, A2C: #325, PPO: #330) and add corresponding notes in /examples/mujoco/README.md
- Fix
numpy>=1.20
typing issue (#323) - Add cross-platform unittest (#331)
- Add a test on how to deal with finite env (#324)
- Add value normalization in on-policy algorithms (#319, #321)
- Separate advantage normalization and value normalization in PPO (#329)
0.4.0
This release contains several API and behavior changes.
API Change
Buffer
- Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (#278, #280);
- Change
buffer.add
API frombuffer.add(obs, act, rew, done, obs_next, info, policy, ...)
tobuffer.add(batch, buffer_ids)
in order to add data more efficient (#280); - Add
set_batch
method in buffer (#278); - Add
sample_index
method, same assample
but only return index instead of both index and batch data (#278); - Add
prev
(one-step previous transition index),next
(one-step next transition index) andunfinished_index
(the last modified index whosedone==False
) (#278); - Add internal method
_alloc_by_keys_diff
in batch to support any form of keys pop up (#280);
Collector
- Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (#280);
- Drop
collector.collect(n_episode=List[int])
because the new collector can collect episodes without bias (#280); - Move
reward_metric
from Collector to trainer (#280); - Change
Collector.collect
logic:AsyncCollector.collect
's semantic is the same as previous version, wherecollect(n_step or n_episode)
will not collect exact n_step or n_episode transitions;Collector.collect(n_step or n_episode)
's semantic now changes to exact n_step or n_episode collect (#280);
Policy
- Add
policy.exploration_noise(action, batch) -> action
method instead of implemented inpolicy.forward()
(#280); - Add
Timelimit.truncate
handler incompute_*_returns
(#296); - remove
ignore_done
flag (#296); - remove
reward_normalization
option in offpolicy-algorithm (will raise Error if set to True) (#298);
Trainer
- Change
collect_per_step
tostep_per_collect
(#293); - Add
update_per_step
andepisode_per_collect
(#293); onpolicy_trainer
now supports either step_collect or episode_collect (#293)- Add BasicLogger and LazyLogger to log data more conveniently (#295)
Bug Fix
0.3.2
0.3.1
API Change
- change
utils.network
args to support any form of MLP by default (#275), removelayer_num
andhidden_layer_size
, addhidden_sizes
(a list of int indicate the network architecture) - add HDF5 save/load method for ReplayBuffer (#261)
- add offline_trainer (#263)
- move Atari-related network to
examples/atari/atari_network.py
(#275)
Bug Fix
- fix a potential bug in discrete behavior cloning policy (#263)