这是indexloc提供的服务,不要输入任何密码
Skip to content

add PSRL policy #202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 81 commits into from
Sep 23, 2020
Merged

add PSRL policy #202

merged 81 commits into from
Sep 23, 2020

Conversation

yaofeng1998
Copy link
Contributor

@yaofeng1998 yaofeng1998 commented Sep 4, 2020

Add PSRL policy in tianshou/policy/modelbase/psrl.py.

Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please refer to https://tianshou.readthedocs.io/en/latest/contributing.html to correct the errors in the unit test, including code-style checking and other unit tests.

@Trinkle23897 Trinkle23897 linked an issue Sep 4, 2020 that may be closed by this pull request
8 tasks
@Trinkle23897 Trinkle23897 changed the base branch from dev to master September 4, 2020 13:28
@thu-ml thu-ml deleted a comment from codecov-commenter Sep 5, 2020
duburcqa
duburcqa previously approved these changes Sep 14, 2020
duburcqa
duburcqa previously approved these changes Sep 14, 2020
self.rew_mean = (self.rew_mean * self.rew_count + rew_sum) / sum_count
self.rew_square_sum += rew_square_sum
raw_std2 = self.rew_square_sum / sum_count - self.rew_mean ** 2
self.rew_std = np.sqrt(1 / (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Can you explain it more?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line of calculating self.rew_std is strange.

Comment on lines +132 to +133
state: Optional[Any] = None,
info: Dict[str, Any] = {},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two arguments are not used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's to be consistent with model API

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "model API"? PSRLModel is the first model and I don't see it has any base-class.

@youkaichao
Copy link
Collaborator

I can roughly get the main idea of PSRL, but still cannot understand some of the details, especially how to update the rew_std. But I am not an expert in PSRL (in fact, I just skipped a few papers). That said, I will not approve this PR (because I don't think I'm qualified), but I'm ok with this pr.

@Trinkle23897 Trinkle23897 merged commit dcfcbb3 into thu-ml:master Sep 23, 2020
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
Add PSRL policy in tianshou/policy/modelbase/psrl.py.

Co-authored-by: n+e <trinkle23897@cmu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Model-based algorithm?
6 participants