-
Notifications
You must be signed in to change notification settings - Fork 1.2k
add PSRL policy #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add PSRL policy #202
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, please refer to https://tianshou.readthedocs.io/en/latest/contributing.html to correct the errors in the unit test, including code-style checking and other unit tests.
self.rew_mean = (self.rew_mean * self.rew_count + rew_sum) / sum_count | ||
self.rew_square_sum += rew_square_sum | ||
raw_std2 = self.rew_square_sum / sum_count - self.rew_mean ** 2 | ||
self.rew_std = np.sqrt(1 / ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. Can you explain it more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line of calculating self.rew_std
is strange.
state: Optional[Any] = None, | ||
info: Dict[str, Any] = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two arguments are not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's to be consistent with model API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "model API"? PSRLModel is the first model and I don't see it has any base-class.
I can roughly get the main idea of PSRL, but still cannot understand some of the details, especially how to update the |
Add PSRL policy in tianshou/policy/modelbase/psrl.py. Co-authored-by: n+e <trinkle23897@cmu.edu>
Add PSRL policy in
tianshou/policy/modelbase/psrl.py
.