`compute_episodic_return` bug when `v_s=None`

A similar thing happens to value functions: why is v[s] and v[s'] calculated twice?

I tried replacing v[s'] values with v[s] as shown below and find that the convergence speed is significantly slower than original implementation.

I'm so curious about the reason.

<img width="932" alt="image" src="https://github.com/thu-ml/tianshou/assets/5180020/79d18b1f-d6e1-476d-9e7c-45c1b2e215ab">



------------------------------------------------------ Update [potentially a bug?] ---------------------------------------------------------

Also if I change to use v[s'] only:

<img width="951" alt="image" src="https://github.com/thu-ml/tianshou/assets/5180020/313b5d6e-befd-44a2-93a1-1ea85fa94e90">




The convergence speed is significantly faster with much higher performance (for halfcheetahv4, from 100k steps 4101 to 6000+):

Original Figure:
<img width="472" alt="image" src="https://github.com/thu-ml/tianshou/assets/5180020/2b62541c-893d-4bc4-a2d4-e5f138f8455e">

Only use v[s'] to calculate gae:
<img width="1793" alt="image" src="https://github.com/thu-ml/tianshou/assets/5180020/57ed9201-fb64-4d71-8985-fc47d2551b5c">

However, I noticed that when `v_s` is `None`, the implementation in `compute_episodic_return` is to use
``` python
        v_s = np.roll(v_s_, 1) if v_s is None else to_numpy(v_s.flatten())
```

Doesn't that mean, say the rollout length is `T=9` (start from 0), for `t=0`, the delta will be `delta_0 = r_0 + v_1-v_9 `?

Yet the convergence and performance boost remains to be explained ToT


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`compute_episodic_return` bug when `v_s=None` #886

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

compute_episodic_return bug when v_s=None #886

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`compute_episodic_return` bug when `v_s=None` #886