-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
A similar thing happens to value functions: why is v[s] and v[s'] calculated twice?
I tried replacing v[s'] values with v[s] as shown below and find that the convergence speed is significantly slower than original implementation.
I'm so curious about the reason.
------------------------------------------------------ Update [potentially a bug?] ---------------------------------------------------------
Also if I change to use v[s'] only:
The convergence speed is significantly faster with much higher performance (for halfcheetahv4, from 100k steps 4101 to 6000+):
Only use v[s'] to calculate gae:
However, I noticed that when v_s
is None
, the implementation in compute_episodic_return
is to use
v_s = np.roll(v_s_, 1) if v_s is None else to_numpy(v_s.flatten())
Doesn't that mean, say the rollout length is T=9
(start from 0), for t=0
, the delta will be delta_0 = r_0 + v_1-v_9
?
Yet the convergence and performance boost remains to be explained ToT
Metadata
Metadata
Assignees
Labels
Type
Projects
Status