这是indexloc提供的服务,不要输入任何密码
Skip to content

compute_episodic_return bug when v_s=None #886

@spacegoing

Description

@spacegoing

A similar thing happens to value functions: why is v[s] and v[s'] calculated twice?

I tried replacing v[s'] values with v[s] as shown below and find that the convergence speed is significantly slower than original implementation.

I'm so curious about the reason.

image

------------------------------------------------------ Update [potentially a bug?] ---------------------------------------------------------

Also if I change to use v[s'] only:

image

The convergence speed is significantly faster with much higher performance (for halfcheetahv4, from 100k steps 4101 to 6000+):

Original Figure:
image

Only use v[s'] to calculate gae:
image

However, I noticed that when v_s is None, the implementation in compute_episodic_return is to use

        v_s = np.roll(v_s_, 1) if v_s is None else to_numpy(v_s.flatten())

Doesn't that mean, say the rollout length is T=9 (start from 0), for t=0, the delta will be delta_0 = r_0 + v_1-v_9 ?

Yet the convergence and performance boost remains to be explained ToT

Metadata

Metadata

Assignees

Labels

RNNTemporary label to group all things RNNperformance issuesSlow execution or poor-quality resultsrefactoringNo change to functionality

Type

No type

Projects

Status

To do

Relationships

None yet

Development

No branches or pull requests

Issue actions