这是indexloc提供的服务,不要输入任何密码
Skip to content

Create high level interfaces for config and experiments #938

@MischaPanch

Description

@MischaPanch

Users should be able to quickly create a trainable policy without having to go through details of instantiating trainers, collectors, and so on. Once an env factory is written, starting the training should be easy. Loading the env and policy from a checkpoint should also be simple. Some inspiration can be taken from the way rllib or SB3 do this - we should just take the best parts/ideas from there though (if any).

This issue has probably one of the highest impact/effort ratios among the existing ones, as it will extend the target audience of tianshou. At least the following points should be fulfilled:

  • Config can be easily read and saved
  • Config can easily be manipulated from the command line
  • Config is saved with each experiment and agent and env are recreated from it at loading. Note that the recreation of custom envs from config is crucial. This means that custom env factories should be registered or at least passed at loading time.

The implementation will require at least

  • Factoring out a bunch of utils from the example scripts
  • Removing the whole args stuff. In my current implementation (not in this repo), config, defaults and explanations are read off from the dataclasses using jsonargparse
  • Changing at least some of the example scripts to use the new structure. In fact, I believe that after this there might remain just a few script that are started using different yaml files.
  • Write at least some small documentation

A tangential, but I believe very important and useful feature would be: compatibility with SB3 and rllib.

What I mean by that is the following: the config determines the training entirely. SB3 and rllib also have high level interfaces, with their own way to configure stuff, their own names and ways to start the training. This complicates comparisons in benchmarking, and also makes users' lives more difficult, in case they want to switch from one lib to another. Note that many "industry" users will never want to go beyond the high level interfaces.

Just by mapping names to each other we could easily implement methods like to_sb3_config or to_rllib_config and even simple functions like train_with_rllib(config) and train_with_sb3(config), in addition to the standard train. This will make reliable performance comparisons very simple, and is thus related to #935. For algorithms or options not implemented in other frameworks, one raises a NotImplementedError, but at least for the standard algorithms the overlap of what is implemented in all libraries is substantial.

@opcode81 Do you want to take a look at it? I will give you access to our fork and internal repositories where I have implemented parts of this with jsonarparse. I also included nni support for these configs, which should also make user's life easy.

NOTE:
We don't have to fully nail it at the beginning, and it also doesn't need to be exhaustive. Would be nice to cover the most commonly used algos in the beginning, say SAC, PPO and TD3.

Once released, there should be as few breaking changes to the high level interfaces as possible, so the new features should spend quite some time in alpha/beta stage before being fully publicized. Having just something for now would already be a great improvement. One can address this issue in multiple PRs building on each other. For example, the compatibility layer with sb3/rllib should be a separate PR.

Metadata

Metadata

Assignees

Labels

enhancementFeature that is not a new algorithm or an algorithm enhancementgood first issueGood for newcomersmajorLarge changes that cannot or should not be broken down into smaller ones

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions