这是indexloc提供的服务,不要输入任何密码
Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b87aea4
Experiment: remove inheritance of DataclassPPrintMixin
MischaPanch Aug 1, 2025
56270cd
Merge branch 'master' into benchmarking
MischaPanch Sep 6, 2025
baf4822
High level interface for launching and evaluating multiple experiments
MischaPanch Oct 13, 2025
a8e70b6
Fixed missing passing of seed
MischaPanch Oct 16, 2025
3f00231
Minor
MischaPanch Oct 16, 2025
56ef25e
Added low-level API example for multiple experiments with rliable eval
MischaPanch Oct 16, 2025
ec344f7
Removed unneeded dedicated hl_multi example
MischaPanch Oct 17, 2025
9234fcd
Modified ppo_hl example, simplifying the config options and adding mu…
MischaPanch Oct 17, 2025
82dc1ad
Simplifying the config options and adding multi-experiment option for…
MischaPanch Oct 17, 2025
ed362e8
Improved and extended rliable evaluation module
MischaPanch Oct 20, 2025
03bdfff
Updated ruff, removed black, formatted
MischaPanch Oct 20, 2025
4f6f7fb
Fix enum instantiation by name
MischaPanch Oct 20, 2025
ff27ffd
Set default num_experiments to 1 in hl scripts
MischaPanch Oct 20, 2025
91029ad
Added a script for benchmarking
MischaPanch Oct 20, 2025
7b4588d
Merge branch 'dev-v2' into benchmarking
opcode81 Oct 23, 2025
e47367d
Minor post-merge cleanup
MischaPanch Oct 23, 2025
bd2f827
Added result aggregation to benchmarking
MischaPanch Oct 23, 2025
6c10201
Extend benchmarking to run for all desired tasks
MischaPanch Oct 23, 2025
e9afaeb
Minor fixes in typing
MischaPanch Oct 24, 2025
203ea48
Removed no longer needed mujoco_ppo_multi.py
MischaPanch Oct 24, 2025
b47ca52
Refactored mujoco low-level examples to use jsonargparse
MischaPanch Oct 24, 2025
8cf6576
Refactored atari low-level examples to use jsonargparse
MischaPanch Oct 25, 2025
91f6030
Reinstating parameterization of v0.5.0 in mujoco hl scripts
MischaPanch Oct 25, 2025
457b82e
Renamed train_envs to training_envs
MischaPanch Oct 25, 2025
a07831f
More parameterization in hl scripts, used in benchmarking
MischaPanch Oct 25, 2025
6017fea
Removed obsolete result aggregation script
MischaPanch Oct 27, 2025
b7e8b93
Bumped epochs for off-policy algos and switched launcher to joblib
MischaPanch Oct 27, 2025
2bcb29f
Automatically set test_step_num_episodes to num_test_envs by default
MischaPanch Oct 27, 2025
fcd48fc
Benchmark script: minor improvement in tmux session counting
MischaPanch Oct 27, 2025
8590c7c
More renamings of type train -> training
MischaPanch Oct 29, 2025
770334d
More renamings of type train -> training
MischaPanch Oct 29, 2025
75fc43c
Bugfix: passing save_interval in create_logger
MischaPanch Oct 29, 2025
151ae1f
Minor restructuring, improved defaults and docstrings in loggers
MischaPanch Oct 29, 2025
38c4b93
Minor restructuring, improved defaults and docstrings in loggers
MischaPanch Oct 29, 2025
f368fe7
Less invasive logging on training
MischaPanch Oct 29, 2025
51493eb
Configurable experiment launcher in run_benchmark.py
MischaPanch Oct 29, 2025
ae19d79
Minor
MischaPanch Oct 29, 2025
ca9515b
Benchmark: support for include and exclude filter
MischaPanch Nov 13, 2025
315b2ba
Lowered number of epochs in mujoco off-policy examples
MischaPanch Nov 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ Our main test environment remains Python 3.11-based for the time being (see `poe
- `highlevel`:
- Change the way in which seeding is handled: The mechanism introduced in v1.1.0
was completely revised:
- The `train_seed` and `test_seed` attributes were removed from `SamplingConfig`.
- The `training_seed` and `test_seed` attributes were removed from `SamplingConfig`.
Instead, the seeds are derived from the seed defined in `ExperimentConfig`.
- Seed attributes of `EnvFactory` classes were removed.
Instead, seeds are passed to methods of `EnvFactory`.
Expand Down Expand Up @@ -555,7 +555,7 @@ A detailed list of changes can be found below.
#1194 #1195
- `env`:
- `EnvFactoryRegistered`: parameter `seed` has been replaced by the pair
of parameters `train_seed` and `test_seed`
of parameters `training_seed` and `test_seed`
Persisted instances will continue to work correctly.
Subclasses such as `AtariEnvFactory` are also affected requires
explicit train and test seeds. #1074
Expand Down
94 changes: 47 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,53 +235,53 @@ almost exclusively concerned with configuration that controls what to do
```python
from tianshou.highlevel.config import OffPolicyTrainingConfig
from tianshou.highlevel.env import (
EnvFactoryRegistered,
VectorEnvType,
EnvFactoryRegistered,
VectorEnvType,
)
from tianshou.highlevel.experiment import DQNExperimentBuilder, ExperimentConfig
from tianshou.highlevel.params.algorithm_params import DQNParams
from tianshou.highlevel.trainer import (
EpochStopCallbackRewardThreshold,
EpochStopCallbackRewardThreshold,
)

experiment = (
DQNExperimentBuilder(
EnvFactoryRegistered(
task="CartPole-v1",
venv_type=VectorEnvType.DUMMY,
train_seed=0,
test_seed=10,
),
ExperimentConfig(
persistence_enabled=False,
watch=True,
watch_render=1 / 35,
watch_num_episodes=100,
),
OffPolicyTrainingConfig(
max_epochs=10,
epoch_num_steps=10000,
batch_size=64,
num_train_envs=10,
num_test_envs=100,
buffer_size=20000,
collection_step_num_env_steps=10,
update_step_num_gradient_steps_per_sample=1 / 10,
),
)
.with_dqn_params(
DQNParams(
lr=1e-3,
gamma=0.9,
n_step_return_horizon=3,
target_update_freq=320,
eps_training=0.3,
eps_inference=0.0,
),
)
.with_model_factory_default(hidden_sizes=(64, 64))
.with_epoch_stop_callback(EpochStopCallbackRewardThreshold(195))
.build()
DQNExperimentBuilder(
EnvFactoryRegistered(
task="CartPole-v1",
venv_type=VectorEnvType.DUMMY,
training_seed=0,
test_seed=10,
),
ExperimentConfig(
persistence_enabled=False,
watch=True,
watch_render=1 / 35,
watch_num_episodes=100,
),
OffPolicyTrainingConfig(
max_epochs=10,
epoch_num_steps=10000,
batch_size=64,
num_training_envs=10,
num_test_envs=100,
buffer_size=20000,
collection_step_num_env_steps=10,
update_step_num_gradient_steps_per_sample=1 / 10,
),
)
.with_dqn_params(
DQNParams(
lr=1e-3,
gamma=0.9,
n_step_return_horizon=3,
target_update_freq=320,
eps_training=0.3,
eps_inference=0.0,
),
)
.with_model_factory_default(hidden_sizes=(64, 64))
.with_epoch_stop_callback(EpochStopCallbackRewardThreshold(195))
.build()
)
experiment.run()
```
Expand Down Expand Up @@ -352,7 +352,7 @@ Define hyper-parameters:
```python
task = 'CartPole-v1'
lr, epoch, batch_size = 1e-3, 10, 64
train_num, test_num = 10, 100
num_training_envs, num_test_envs = 10, 100
gamma, n_step, target_freq = 0.9, 3, 320
buffer_size = 20000
eps_train, eps_test = 0.1, 0.05
Expand All @@ -369,8 +369,8 @@ Create the environments:

```python
# You can also try SubprocVectorEnv, which will use parallelization
train_envs = ts.env.DummyVectorEnv([lambda: gym.make(task) for _ in range(train_num)])
test_envs = ts.env.DummyVectorEnv([lambda: gym.make(task) for _ in range(test_num)])
training_envs = ts.env.DummyVectorEnv([lambda: gym.make(task) for _ in range(num_training_envs)])
test_envs = ts.env.DummyVectorEnv([lambda: gym.make(task) for _ in range(num_test_envs)])
```

Create the network, policy, and algorithm:
Expand Down Expand Up @@ -408,10 +408,10 @@ algorithm = DQN(
Set up the collectors:

```python
train_collector = ts.data.Collector[CollectStats](
training_collector = ts.data.Collector[CollectStats](
algorithm,
train_envs,
ts.data.VectorReplayBuffer(buffer_size, num_train_envs),
training_envs,
ts.data.VectorReplayBuffer(buffer_size, num_training_envs),
exploration_noise=True,
)
test_collector = ts.data.Collector[CollectStats](
Expand All @@ -426,7 +426,7 @@ Let's train the model using the algorithm:
```python
result = algorithm.run_training(
OffPolicyTrainerParams(
train_collector=train_collector,
training_collector=training_collector,
test_collector=test_collector,
max_epochs=epoch,
epoch_num_steps=epoch_num_steps,
Expand Down
Loading