Adding Hyperparameter Optimisation (HPO)

- [ ] I have marked all applicable categories:
    + [ ] exception-raising bug
    + [ ] RL algorithm bug
    + [ ] documentation request (i.e. "X is missing from the documentation.")
    + [x] new feature request
- [ ] I have visited the [source website](https://github.com/thu-ml/tianshou/)
- [x] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
- [ ] I have mentioned version numbers, operating system and environment, where applicable:
  ```python
  import tianshou, gymnasium as gym, torch, numpy, sys
  print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
  ```
A common task when using deep rl is to tune hyperparameters. While a lucky hand or grid search are always possible, more structured approaches are desirable and computationally preferable. 
The recent paper [Hyperparameters in Reinforcement Learning and How To Tune Them](https://arxiv.org/pdf/2306.01324.pdf) proposes an evaluation protocol for (hpo for) deep rl. 

Often the result of rl experiments depends greatly on the selected seeds, with a high variance between seeds. The paper proposes as evaluation procedure to define and report disjoint sets of training and evaluation seeds. Each run (of plain rl or hpo+rl) is performed on a set of training seeds and evaluated on the set of test seeds. 

A possible implementation strategy is to use hydra for the configuration of the search spaces (on top of the high level interfaces #970). This allows the combination with a) optuna hydra sweepers as well as b) the hpo sweepers from the aforementioned paper. We will contact the authors to integrate the sweepers from their [repo](https://github.com/facebookresearch/how-to-autorl) which contains sweepers for:

[Differential Evolution Hyperband](https://arxiv.org/pdf/2105.09821.pdf)
Standard [Population Based Training](https://arxiv.org/pdf/1711.09846.pdf) (with warmstarting option)
[Population Based Bandits](https://arxiv.org/pdf/2002.02518.pdf) (with Mix/Multi versions and warmstarting option)
[Bayesian-Generational Population Based Training](https://arxiv.org/pdf/2207.09405v1.pdf)

@MischaPanch 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding Hyperparameter Optimisation (HPO) #978

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding Hyperparameter Optimisation (HPO) #978

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions