csi-rbdplugin on the node crashes on start with nil pointer

csi-rbdplugin is crashing on one of the k8s node.  Here is the log of the csi-rbdplugin container:
```
Defaulted container "csi-rbdplugin" out of: csi-rbdplugin, driver-registrar, liveness-prometheus
I0826 09:34:01.412332   57346 cephcsi.go:191] Driver version: v3.11.0 and Git version: bc24b5eca87626d690a29effa9d7420cc0154a7a
I0826 09:34:01.413253   57346 cephcsi.go:268] Initial PID limit is set to 256123
I0826 09:34:01.413510   57346 cephcsi.go:274] Reconfigured PID limit to -1 (max)
I0826 09:34:01.414051   57346 cephcsi.go:223] Starting driver type: rbd with name: rbd.csi.ceph.com
I0826 09:34:01.438534   57346 mount_linux.go:282] Detected umount with safe 'not mounted' behavior
I0826 09:34:01.453157   57346 rbd_attach.go:242] nbd module loaded
I0826 09:34:01.453253   57346 rbd_attach.go:256] kernel version "6.6.43-flatcar" supports cookie feature
I0826 09:34:01.497897   57346 rbd_attach.go:272] rbd-nbd tool supports cookie feature
I0826 09:34:01.498969   57346 server.go:114] listening for CSI-Addons requests on address: &net.UnixAddr{Name:"/csi/csi-addons.sock", Net:"unix"}
I0826 09:34:01.499266   57346 server.go:117] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1bedd56]

goroutine 75 [running]:
github.com/ceph/ceph-csi/internal/rbd.RunVolumeHealer(0xc0007e7ea0, 0x3b27aa0)
        /go/src/github.com/ceph/ceph-csi/internal/rbd/rbd_healer.go:199 +0x3d6
github.com/ceph/ceph-csi/internal/rbd/driver.(*Driver).Run.func1()
        /go/src/github.com/ceph/ceph-csi/internal/rbd/driver/driver.go:191 +0x1f
created by github.com/ceph/ceph-csi/internal/rbd/driver.(*Driver).Run in goroutine 1
        /go/src/github.com/ceph/ceph-csi/internal/rbd/driver/driver.go:189 +0x749
```
I tried to  run a debugger in the container and traced to the PV causing the crash:
```
Name:            pvc-4a532f6e-35c3-11e7-870a-00505601176d
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller: yes
                 pv.kubernetes.io/migrated-to: rbd.csi.ceph.com
                 pv.kubernetes.io/provisioned-by: kubernetes.io/rbd
Finalizers:      [kubernetes.io/pv-protection external-provisioner.volume.kubernetes.io/finalizer]
StorageClass:    fast
Status:          Bound
Claim:           default/devops-compair-deploy-staging-redis-pvc
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        2Gi
Node Affinity:   <none>
Message:
Source:
    Type:          RBD (a Rados Block Device mount on the host that shares a pod's lifetime)
    CephMonitors:  [10.93.1.100:6789]
    RBDImage:      kubernetes-dynamic-pvc-4a5f348c-35c3-11e7-a683-005056011766
    FSType:
    RBDPool:       rbd
    RadosUser:     kube
    Keyring:       /etc/ceph/keyring
    SecretRef:     &SecretReference{Name:ceph-secret-user,Namespace:default,}
    ReadOnly:      false
Events:            <none>
```
And also traced that `pv.Spec.PersistentVolumeSource.CSI` is `nil` on this [line](https://github.com/ceph/ceph-csi/blob/release-v3.12/internal/rbd/rbd_healer.go#L200)

Here is the value of `pv.Spec.PersistentVolumeSource` before crash:
```
(dlv) print pv.Spec.PersistentVolumeSource
k8s.io/api/core/v1.PersistentVolumeSource {
        GCEPersistentDisk: *k8s.io/api/core/v1.GCEPersistentDiskVolumeSource nil,
        AWSElasticBlockStore: *k8s.io/api/core/v1.AWSElasticBlockStoreVolumeSource nil,
        HostPath: *k8s.io/api/core/v1.HostPathVolumeSource nil,
        Glusterfs: *k8s.io/api/core/v1.GlusterfsPersistentVolumeSource nil,
        NFS: *k8s.io/api/core/v1.NFSVolumeSource nil,
        RBD: *k8s.io/api/core/v1.RBDPersistentVolumeSource {
                CephMonitors: []string len: 1, cap: 4, [
                        "10.93.1.100:6789",
                ],
                RBDImage: "kubernetes-dynamic-pvc-4a5f348c-35c3-11e7-a683-005056011766",
                FSType: "",
                RBDPool: "rbd",
                RadosUser: "kube",
                Keyring: "/etc/ceph/keyring",
                SecretRef: *(*"k8s.io/api/core/v1.SecretReference")(0xc00061c9a0),
                ReadOnly: false,},
        ISCSI: *k8s.io/api/core/v1.ISCSIPersistentVolumeSource nil,
        Cinder: *k8s.io/api/core/v1.CinderPersistentVolumeSource nil,
        CephFS: *k8s.io/api/core/v1.CephFSPersistentVolumeSource nil,
        FC: *k8s.io/api/core/v1.FCVolumeSource nil,
        Flocker: *k8s.io/api/core/v1.FlockerVolumeSource nil,
        FlexVolume: *k8s.io/api/core/v1.FlexPersistentVolumeSource nil,
        AzureFile: *k8s.io/api/core/v1.AzureFilePersistentVolumeSource nil,
        VsphereVolume: *k8s.io/api/core/v1.VsphereVirtualDiskVolumeSource nil,
        Quobyte: *k8s.io/api/core/v1.QuobyteVolumeSource nil,
        AzureDisk: *k8s.io/api/core/v1.AzureDiskVolumeSource nil,
        PhotonPersistentDisk: *k8s.io/api/core/v1.PhotonPersistentDiskVolumeSource nil,
        PortworxVolume: *k8s.io/api/core/v1.PortworxVolumeSource nil,
        ScaleIO: *k8s.io/api/core/v1.ScaleIOPersistentVolumeSource nil,
        Local: *k8s.io/api/core/v1.LocalVolumeSource nil,
        StorageOS: *k8s.io/api/core/v1.StorageOSPersistentVolumeSource nil,
        CSI: *k8s.io/api/core/v1.CSIPersistentVolumeSource nil,}
```

Looks like it should reference `RBD` instead of `CSI`, or `VolumeHealer` should skip `RBD` volumes?

NOTE: the PV was provisioned by in-tree `kubernetes.io/rbd` provisioner and migrated to CSI. 

I don't have enough knowledge on either ceph-csi code base or volume healer function to continue my debugging. Any suggestions? Thanks!

# Environment details #

- Image/version of Ceph CSI driver : v3.11.0 and v3.12.1
- Helm chart version : v3.11.0 and v3.12.1
- Kernel version : 6.1.90-flatcar
- Mounter used for mounting PVC (for cephFS its `fuse` or `kernel`. for rbd its
  `krbd` or `rbd-nbd`) :
- Kubernetes cluster version : v1.27.16
- Ceph cluster version : v16.2.15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

csi-rbdplugin on the node crashes on start with nil pointer #4807

Environment details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

csi-rbdplugin on the node crashes on start with nil pointer #4807

Description

Environment details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions