这是indexloc提供的服务,不要输入任何密码
Skip to content

Suggestion: Abandon name 'vpg' but use REINFORCE to replace it #317

@ChenDRAG

Description

@ChenDRAG

As is stated in #307 , as far as what I know, vpg itself is not a very formal algorithm in the literature (first appears in Spinningup's docs, I think) and is loosely defined. In SpinningUp's implementation, they use a generative advantage estimator(GAE) which requires a dnn value predictor(critic). This is more close to the definition of A2C(both in OPENAI baselines and in Tianshou). However, In tianshou, our current 'vpg' algorithm still follows strictly REINFORCE (use rew to go to update actor) algorithm pipeline. As a result, I suggest we do not use the definition of vpg to avoid possible confusion, but use REINFORCE instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionDiscussion of a typical issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions