Assume I have a two-level HRL algorithm, where a upper-level agent decides which sub-task to perform, and the lower-level agents (maybe one agent for one sub-task) can solve the corresponding sub-task. The reward of the upper-level agent is obtained after the sub-task is solved by the lower-level agent. Can you give some guidence about how to implement such idea?