Replies: 1 comment
-
First, I assume your environment's action format is a dict, e.g., {"a": [3, 4], "b": 5.0} the next step is to inherit an existing policy class and overwrite its forward function, so that the return value is a Batch that contains the desired action format, e.g., def forward(self, ...):
...
return Batch(act=Batch(a=..., b=...), ...) Feel free to modify the way that calculates the result of a and b, for example, you can directly return a and b in your network forward. And that's it. Note: in forward function, action a and b are batch data, i.e., a has a shape of |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Any tutorials on warpping a (multi-actions) custom gym-env for tianshou?
有没有官方交流群?感觉弄个群交流会方便大家使用
Beta Was this translation helpful? Give feedback.
All reactions