You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As is stated in #307 , as far as what I know, vpg itself is not a very formal algorithm in the literature (first appears in Spinningup's docs, I think) and is loosely defined. In SpinningUp's implementation, they use a generative advantage estimator(GAE) which requires a dnn value predictor(critic). This is more close to the definition of A2C(both in OPENAI baselines and in Tianshou). However, In tianshou, our current 'vpg' algorithm still follows strictly REINFORCE (use rew to go to update actor) algorithm pipeline. As a result, I suggest we do not use the definition of vpg to avoid possible confusion, but use REINFORCE instead.