Suggestion: Abandon name 'vpg' but use REINFORCE to replace it

As is stated in #307 , as far as what I know, vpg itself is not a very formal algorithm in the literature (first appears in Spinningup's docs, I think) and is loosely defined.  In SpinningUp's implementation, they use a generative advantage estimator(GAE) which requires a dnn value predictor(critic). This is more close to the definition of A2C(both in OPENAI baselines and in Tianshou). However, In tianshou, our current 'vpg' algorithm still follows strictly REINFORCE (use rew to go to update actor) algorithm pipeline.  As a result, I suggest we do not use the definition of vpg to avoid possible confusion, but use REINFORCE instead.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suggestion: Abandon name 'vpg' but use REINFORCE to replace it #317

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggestion: Abandon name 'vpg' but use REINFORCE to replace it #317

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions