-
Notifications
You must be signed in to change notification settings - Fork 245
Closed
Description
ISCSI Session healing will take the following:
- detect all ISCSI sessions that are not in "logged in" state
- wait for a timeout
- log them out and back in
This has the impact of causing any ext4 filesystems mounted on top of devices owned by that session go read-only, leading to any pods consuming those PVs to become irrecoverable.
Consider an (unfortunately) extended network outage:
- all iscsi sessions states becomes "FREE"
- Trident will detect this sessions as stale (not LOGGED IN)
- after the session recovery timeout, trident will set the action for the sessions to LogoutLoginRescan
- Trident issues iscsiadm -m ..... -u on the sessions
- Upon logout of the sessions, Linux tears down each of the /dev/sdXX block devices
- Upon teardown of the last sdXX backing a given volume, multipath returns EIO to any outstanding IO on the /dev/dm-XX device
- When ext4 receives EIO for a jbd2 IO, it intentionally and irrecoverably marks the filesystem as read only
At this point, the Pod sees an RO PV that cannot be recovered without a remount of the file system as RW, and a restart of the pod. The session healing has turned a recoverable network outage into an irrecoverable degradation of the file system.
jwebster7