-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, gym, torch, numpy, sys print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
Hello, I have some questions in the code of rewrite_transitions
in HERReplayBuffer
:
def rewrite_transitions(self, indices: np.ndarray) -> None:
"""Re-write the goal of some sampled transitions' episodes according to HER.
Currently applies only HER's 'future' strategy. The new goals will be written \
directly to the internal batch data temporarily and will be restored right \
before the next sampling or when using some of the buffer's method (e.g. \
`add`, `save_hdf5`, etc.). This is to make sure that n-step returns \
calculation etc., performs correctly without additional alteration.
"""
if indices.size == 0:
return
# Sort indices keeping chronological order
indices[indices < self._index] += self.maxsize
indices = np.sort(indices)
indices[indices >= self.maxsize] -= self.maxsize
# Construct episode trajectories
indices = [indices]
for _ in range(self.horizon - 1):
indices.append(self.next(indices[-1]))
indices = np.stack(indices)
# Calculate future timestep to use
current = indices[0]
terminal = indices[-1]
future_offset = np.random.uniform(size=len(indices[0])) * (terminal - current)
future_offset = future_offset.astype(int)
future_t = (current + future_offset)
...
As ReplayBuffer is implemented as a circular queue, the indices in terminal
may be less than the corresponding indices in current
. So that some elements in future_offset
may be negative, which will make the states in future_t
not desired future states.
@Juno-T
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working