-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
performance issuesSlow execution or poor-quality resultsSlow execution or poor-quality results
Description
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- design request (i.e. "X should be changed to Y.")
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
I have noticed that in the implementation of the PPOPolicy, the computation of the old log probabilities logp_old
is performed without using minibatch:
with torch.no_grad():
batch.logp_old = self(batch).dist.log_prob(batch.act)
This makes this algorithm unusable in situations where the batch is too large, with no possibility of controlling it via batch_size.
I simply suggest to add support for minibatch:
logp_old = []
with torch.no_grad():
for minibatch in batch.split(self._batch, shuffle=False, merge_last=True):
logp_old.append(self(minibatch).dist.log_prob(minibatch.act))
batch.logp_old = torch.cat(logp_old, dim=0).flatten()
The version of Tianshou that I'm using is 1.0.0.
Trinkle23897 and NathanaelBeau
Metadata
Metadata
Assignees
Labels
performance issuesSlow execution or poor-quality resultsSlow execution or poor-quality results