A bug about dual clip PPO implementation

- [ ] I have marked all applicable categories:
    + [ ] exception-raising bug
    + [ ] RL algorithm bug
    + [ ] documentation request (i.e. "X is missing from the documentation.")
    + [ ] new feature request
- [x] I have visited the [source website](https://github.com/thu-ml/tianshou/)
- [x] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
- [ ] I have mentioned version numbers, operating system and environment, where applicable:
  ```python
  import tianshou, torch, numpy, sys
  print(tianshou.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
  ```
---

Hello, I have a question about the dual clipping implementation for PPO. 
https://github.com/thu-ml/tianshou/blob/8a5e2190f7045ffee2ffa4346c0010693ac7b222/tianshou/policy/modelfree/ppo.py#L114-L123
Is this implementation correct? I mean, the paper arXiv:1912.09729 uses dual clipping `max(min(surr1, surr2), c*adv)` only when the advantage (\hat{A} in paper) is *negative*. Because if \hat{A} is positive, it may have a much larger number than surr1, surr2, it then brings a huge policy loss after taking `max(..., c*adv)` (btw, c is always >1.0). So should this be changed like...?
```python
if self._dual_clip:
    clip1 = torch.min(surr1, surr2)
    clip2 = torch.max(clip1, self._dual_clip * b.adv)
    clip_loss = -torch.where(b.adv < 0, clip2, clip1),mean()
else:
    clip_loss = -torch.min(surr1, surr2).mean()
```



	ratio = (dist.log_prob(b.act) - b.logp_old).exp().float()
	ratio = ratio.reshape(ratio.size(0), -1).transpose(0, 1)
	surr1 = ratio * b.adv
	surr2 = ratio.clamp(1.0 - self._eps_clip, 1.0 + self._eps_clip) * b.adv
	if self._dual_clip:
	clip_loss = -torch.max(
	torch.min(surr1, surr2), self._dual_clip * b.adv
	).mean()
	else:
	clip_loss = -torch.min(surr1, surr2).mean()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A bug about dual clip PPO implementation #433

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A bug about dual clip PPO implementation #433

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions