这是indexloc提供的服务,不要输入任何密码
Skip to content

ISCSI Session Healing can make bad situations worse #961

@speedyguy17

Description

@speedyguy17

ISCSI Session healing will take the following:

  • detect all ISCSI sessions that are not in "logged in" state
  • wait for a timeout
  • log them out and back in

This has the impact of causing any ext4 filesystems mounted on top of devices owned by that session go read-only, leading to any pods consuming those PVs to become irrecoverable.

Consider an (unfortunately) extended network outage:

  • all iscsi sessions states becomes "FREE"
  • Trident will detect this sessions as stale (not LOGGED IN)
  • after the session recovery timeout, trident will set the action for the sessions to LogoutLoginRescan
  • Trident issues iscsiadm -m ..... -u on the sessions
  • Upon logout of the sessions, Linux tears down each of the /dev/sdXX block devices
  • Upon teardown of the last sdXX backing a given volume, multipath returns EIO to any outstanding IO on the /dev/dm-XX device
  • When ext4 receives EIO for a jbd2 IO, it intentionally and irrecoverably marks the filesystem as read only

At this point, the Pod sees an RO PV that cannot be recovered without a remount of the file system as RW, and a restart of the pod. The session healing has turned a recoverable network outage into an irrecoverable degradation of the file system.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions