Releases: tensorforce/tensorforce
Releases · tensorforce/tensorforce
Tensorforce 0.6.5
Agents:
- Renamed agent argument
reward_preprocessing
toreward_processing
, and in case of Tensorforce agent moved toreward_estimation[reward_processing]
Distributions:
- New
categorical
distribution argumentskip_linear
to not add the implicit linear logits layer
Environments:
- Support for multi-actor parallel environments via new function
Environment.num_actors()
Runner
uses multi-actor parallelism by default if environment is multi-actor
- New optional
Environment
functionepisode_return()
which returns the true return of the last episode, if cumulative sum of environment rewards is not a good metric for runner display
Examples:
- New
vectorized_environment.py
andmultiactor_environment.py
script to illustrate how to setup a vectorized/multi-actor environment.
Tensorforce 0.6.4
Agents:
- Agent argument
update_frequency
/update[frequency]
now supports float values > 0.0, which specify the update-frequency relative to the batch-size - Changed default value for argument
update_frequency
from1.0
to0.25
for DQN, DoubleDQN, DuelingDQN agents - New argument
return_processing
andadvantage_processing
(where applicable) for all agent sub-types - New function
Agent.get_specification()
which returns the agent specification as dictionary - New function
Agent.get_architecture()
which returns a string representation of the network layer architecture
Modules:
- Improved and simplified module specification, for instance:
network=my_module
instead ofnetwork=my_module.TestNetwork
, orenvironment=envs.custom_env
instead ofenvironment=envs.custom_env.CustomEnvironment
(module file needs to be in the same directory or a sub-directory)
Networks:
- New argument
single_output=True
for some policy types which, ifFalse
, allows the specification of additional network outputs for some/all actions via registered tensors KerasNetwork
argumentmodel
now supports arbitrary functions as long as they return atf.keras.Model
Layers:
- New layer type
SelfAttention
(specification key:self_attention
)
Parameters:
- Support tracking of non-constant parameter values
Runner:
- Rename attribute
episode_rewards
asepisode_returns
, and TQDM statusreward
asreturn
- Extend argument
agent
to supportAgent.load()
keyword arguments to load an existing agent instead of creating a new one.
Examples:
- Added
action_masking.py
example script to illustrate an environment implementation with built-in action masking.
Buxfixes:
- Customized device placement was not applied to most tensors
Tensorforce 0.6.3
Agents:
- New agent argument
tracking
and corresponding functiontracked_tensors()
to track and retrieve the current value of predefined tensors, similar tosummarizer
for TensorBoard summaries - New experimental value
trace_decay
andgae_decay
for Tensorforce agent argumentreward_estimation
, soon for other agent types as well - New options
"early"
and"late"
for valueestimate_advantage
of Tensorforce agent argumentreward_estimation
- Changed default value for
Agent.act()
argumentdeterministic
fromFalse
toTrue
Networks:
- New network type
KerasNetwork
(specification key:keras
) as wrapper for networks specified as Keras model - Passing a Keras model class/object as policy/network argument is automatically interpreted as
KerasNetwork
Distributions:
- Changed
Gaussian
distribution argumentglobal_stddev=False
tostddev_mode='predicted'
- New
Categorical
distribution argumenttemperature_mode=None
Layers:
- New option for
Function
layer argumentfunction
to pass string function expression with argument "x", e.g. "(x+1.0)/2.0"
Summarizer:
- New summary
episode-length
recorded as part of summary label "reward"
Environments:
- Support for vectorized parallel environments via new function
Environment.is_vectorizable()
and new argumentnum_parallel
forEnvironment.reset()
- See
tensorforce/environments.cartpole.py
for a vectorizable environment example Runner
uses vectorized parallelism by default ifnum_parallel > 1
,remote=None
and environment supports vectorization- See
examples/act_observe_vectorized.py
for more details on act-observe interaction
- See
- New extended and vectorizable custom CartPole environment via key
custom_cartpole
(work in progress) - New environment argument
reward_shaping
to provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression
run.py script:
- New option for command line arguments
--checkpoints
and--summaries
to add comma-separated checkpoint/summary filename in addition to directory - Added episode lengths to logging plot besides episode returns
Buxfixes:
- Temporal horizon handling of RNN layers
- Critical bugfix for late horizon value prediction (including DQN variants and DPG agent) in combination with baseline RNN
- GPU problems with scatter operations
Tensorforce 0.6.2
Buxfixes:
- Critical bugfix for DQN variants and DPG agent
Tensorforce 0.6.1
Agents:
- Removed default value
"adam"
for Tensorforce agent argumentoptimizer
(since default optimizer argumentlearning_rate
removed, see below) - Removed option
"minimum"
for Tensorforce agent argumentmemory
, useNone
instead - Changed default value for
dqn
/double_dqn
/dueling_dqn
agent argumenthuber_loss
from0.0
toNone
Layers:
- Removed default value
0.999
forexponential_normalization
layer argumentdecay
- Added new layer
batch_normalization
(generally should only be used for the agent argumentsreward_processing[return_processing]
andreward_processing[advantage_processing]
) - Added
exponential/instance_normalization
layer argumentonly_mean
with defaultFalse
- Added
exponential/instance_normalization
layer argumentmin_variance
with default1e-4
Optimizers:
- Removed default value
1e-3
for optimizer argumentlearning_rate
- Changed default value for optimizer argument
gradient_norm_clipping
from1.0
toNone
(no gradient clipping) - Added new optimizer
doublecheck_step
and corresponding argumentdoublecheck_update
for optimizer wrapper - Removed
linesearch_step
optimizer argumentaccept_ratio
- Removed
natural_gradient
optimizer argumentreturn_improvement_estimate
Saver:
- Added option to specify agent argument
saver
as string, which is interpreted assaver[directory]
with otherwise default values - Added default value for agent argument
saver[frequency]
as10
(save model every 10 updates by default) - Changed default value of agent argument
saver[max_checkpoints]
from5
to10
Summarizer:
- Added option to specify agent argument
summarizer
as string, which is interpreted assummarizer[directory]
with otherwise default values - Renamed option of agent argument
summarizer
fromsummarizer[labels]
tosummarizer[summaries]
(use of the term "label" due to earlier version, outdated and confusing by now) - Changed interpretation of agent argument
summarizer[summaries] = "all"
to include only numerical summaries, so all summaries except "graph" - Changed default value of agent argument
summarizer[summaries]
from["graph"]
to"all"
- Changed default value of agent argument
summarizer[max_summaries]
from5
to7
(number of different colors in TensorBoard) - Added option
summarizer[filename]
to agent argumentsummarizer
Recorder:
- Added option to specify agent argument
recorder
as string, which is interpreted asrecorder[directory]
with otherwise default values
run.py script:
- Added
--checkpoints
/--summaries
/--recordings
command line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configuration
Examples:
- Added
save_load_agent.py
example script to illustrate regular agent saving and loading
Buxfixes
- Fixed problem with optimizer argument
gradient_norm_clipping
not being applied correctly - Fixed problem with
exponential_normalization
layer not updating moving mean and variance correctly - Fixed problem with
recent
memory for timestep-based updates sometimes sampling invalid memory indices
Tensorforce 0.6.0
- Removed agent arguments
execution
,buffer_observe
,seed
- Renamed agent arguments
baseline_policy
/baseline_network
/critic_network
tobaseline
/critic
- Renamed agent
reward_estimation
argumentsestimate_horizon
topredict_horizon_values
,estimate_actions
topredict_action_values
,estimate_terminal
topredict_terminal_values
- Renamed agent argument
preprocessing
tostate_preprocessing
- Default agent preprocessing
linear_normalization
- Moved agent arguments for reward/return/advantage processing from
preprocessing
toreward_preprocessing
andreward_estimation[return_/advantage_processing]
- New agent argument
config
with valuesbuffer_observe
,enable_int_action_masking
,seed
- Renamed PPO/TRPO/DPG argument
critic_network
/_optimizer
tobaseline
/baseline_optimizer
- Renamed PPO argument
optimization_steps
tomulti_step
- New TRPO argument
subsampling_fraction
- Changed agent argument
use_beta_distribution
default to false - Added double DQN agent (
double_dqn
) - Removed
Agent.act()
argumentevaluation
- Removed agent function arguments
query
(functionality removed) - Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf):
save
/load
functions andsaver
argument changed - Default behavior when specifying
saver
is not to load agent, unless agent is created viaAgent.load
- Agent summarizer functionality changed:
summarizer
argument changed, some summary labels and other options removed - Renamed RNN layers
internal_{rnn/lstm/gru}
tornn/lstm/gru
andrnn/lstm/gru
toinput_{rnn/lstm/gru}
- Renamed
auto
network argumentinternal_rnn
tornn
- Renamed
(internal_)rnn/lstm/gru
layer argumentlength
tohorizon
- Renamed
update_modifier_wrapper
tooptimizer_wrapper
- Renamed
optimizing_step
tolinesearch_step
, andUpdateModifierWrapper
argumentoptimizing_iterations
tolinesearch_iterations
- Optimizer
subsampling_step
accepts both absolute (int) and relative (float) fractions - Objective
policy_gradient
argumentratio_based
renamed toimportance_sampling
- Added objectives
state_value
andaction_value
- Added
Gaussian
distribution argumentsglobal_stddev
andbounded_transform
(for improved bounded action space handling) - Changed default memory
device
argument toCPU:0
- Renamed rewards summaries
Agent.create()
accepts act-function asagent
argument for recording- Singleton states and actions are now consistently handled as singletons
- Major change to policy handling and defaults, in particular
parametrized_distributions
, new default policiesparametrized_state/action_value
- Combined
long
andint
type - Always wrap environment in
EnvironmentWrapper
class - Changed
tune.py
arguments
Tensorforce 0.5.5
- Changed independent mode of
agent.act
to use final values of dynamic hyperparameters and avoid TensorFlow conditions - Extended
"tensorflow"
format ofagent.save
to include an optimized Protobuf model with an act-only graph as.pb
file, andAgent.load
format"pb-actonly"
to load act-only agent based on Protobuf model - Support for custom summaries via new
summarizer
argument valuecustom
to specify summary type, andAgent.summarize(...)
to record summary values - Added min/max-bounds for dynamic hyperparameters min/max-bounds to assert valid range and infer other arguments
- Argument
batch_size
now mandatory for all agent classes - Removed
Estimator
argumentcapacity
, now always automatically inferred - Internal changes related to agent arguments
memory
,update
andreward_estimation
- Changed the default
bias
andactivation
argument of some layers - Fixed issues with
sequence
preprocessor - DQN and dueling DQN properly constrained to
int
actions only - Added
use_beta_distribution
argument with defaultTrue
to many agents andParametrizedDistributions
policy, so default can be changed
Tensorforce 0.5.4
- DQN/DuelingDQN/DPG argument
memory
now required to be specified explicitly, plusupdate_frequency
default changed - Removed (temporarily)
conv1d/conv2d_transpose
layers due to TensorFlow gradient problems Agent
,Environment
andRunner
can now be imported viafrom tensorforce import ...
- New generic reshape layer available as
reshape
- Support for batched version of
Agent.act
andAgent.observe
- Support for parallelized remote environments based on Python's
multiprocessing
andsocket
(replacingtensorforce/contrib/socket_remote_env/
andtensorforce/environments/environment_process_wrapper.py
), available viaEnvironment.create(...)
,Runner(...)
andrun.py
- Removed
ParallelRunner
and merged functionality withRunner
- Changed
run.py
arguments - Changed independent mode for
Agent.act
: additional argumentinternals
and corresponding return value, initial internals viaAgent.initial_internals()
,Agent.reset()
not required anymore - Removed
deterministic
argument forAgent.act
unless independent mode - Added
format
argument tosave
/load
/restore
with supported formatstensorflow
,numpy
andhdf5
- Changed
save
argumentappend_timestep
toappend
with defaultNone
(instead of'timesteps'
) - Added
get_variable
andassign_variable
agent functions
Tensorforce 0.5.3
- Added optional
memory
argument to various agents - Improved summary labels, particularly
"entropy"
and"kl-divergence"
linear
layer now accepts tensors of rank 1 to 3- Network output / distribution input does not need to be a vector anymore
- Transposed convolution layers (
conv1d/2d_transpose
) - Parallel execution functionality contributed by @jerabaul29, currently under
tensorforce/contrib/
- Accept string for runner
save_best_agent
argument to specify best model directory different fromsaver
configuration saver
argumentsteps
removed andseconds
renamed tofrequency
- Moved
Parallel/Runner
argumentmax_episode_timesteps
fromrun(...)
to constructor - New
Environment.create(...)
argumentmax_episode_timesteps
- TensorFlow 2.0 support
- Improved Tensorboard summaries recording
- Summary labels
graph
,variables
andvariables-histogram
temporarily not working - TF-optimizers updated to TensorFlow 2.0 Keras optimizers
- Added TensorFlow Addons dependency, and support for TFA optimizers
- Changed unit of
target_sync_frequency
from timesteps to updates fordqn
anddueling_dqn
agent
Tensorforce 0.5.2
- Improved unittest performance
- Added
updates
and renamedtimesteps
/episodes
counter for agents and runners - Renamed
critic_{network,optimizer}
argument tobaseline_{network,optimizer}
- Added Actor-Critic (
ac
), Advantage Actor-Critic (a2c
) and Dueling DQN (dueling_dqn
) agents - Improved "same" baseline optimizer mode and added optional weight specification
- Reuse layer now global for parameter sharing across modules
- New block layer type (
block
) for easier sharing of layer blocks - Renamed
PolicyAgent/-Model
toTensorforceAgent/-Model
- New
Agent.load(...)
function, saving includes agent specification - Removed
PolicyAgent
argument(baseline-)network
- Added policy argument
temperature
- Removed
"same"
and"equal"
options forbaseline_*
arguments and changed internal baseline handling - Combined
state/action_value
tovalue
objective with argumentvalue
either"state"
or"action"