这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@jvasso
Copy link
Contributor

@jvasso jvasso commented Jul 6, 2024

Closes #1164

In PPOPolicy, the method process_fn() now computes logp_old in minibatch instead of all at once.

@MischaPanch
Copy link
Collaborator

This looks innocent, but tests fail. I'll have a look at what goes wrong and will finalize this.

Sorry for the late reaction, my official work has moved me in another direction for a few months, but now I'm back to devoting time to tianshou :)

@MischaPanch
Copy link
Collaborator

Looks like just a typo, _batch -> batch

@MischaPanch
Copy link
Collaborator

Ok, I had a closer look. The current handling is a bit confusing, there is self.max_batchsize which is used implicitly through a method defined in super class. So the fix is to split according to this param, pushing the fix now

@MischaPanch MischaPanch enabled auto-merge (squash) July 20, 2024 08:03
@MischaPanch MischaPanch disabled auto-merge July 20, 2024 13:48
@MischaPanch MischaPanch merged commit 324e3d2 into thu-ml:master Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No minibatch for computation of logp_old in PPOPolicy

3 participants