这是indexloc提供的服务,不要输入任何密码
Skip to content

The Trident operator fails to install via Helm on Rancher #839

@lindhe

Description

@lindhe

Describe the bug

When installing the Trident operator from the Helm chart in a Kubernetes cluster managed by Rancher, the operator fails because it is unable to add the PSA label pod-security.kubernetes.io/enforce: privileged on its installation namespace. This is because Rancher has a special admission webhook in place for setting PSA labels, which must be granted to the ServiceAccount, on top of all the other RBAC rules it needs.

Environment

  • Trident version: 23.04.0
  • Trident installation flags used: helm install trident netapp-trident/trident-operator --version 23.04.0 --create-namespace --namespace trident
  • Container runtime: Containerd v1.6.19-k3s1
  • Kubernetes version: v1.25.9
  • Kubernetes orchestrator: Rancher v2.7.5
  • Kubernetes enabled feature gates: None.
  • OS: Ubuntu 22.04.2 LTS
  • NetApp backend types: n/a
  • Other: n/a

To Reproduce

  1. Have a Rancher managed RKE2 cluster (but I'm guessing it'll work with any Rancher managed cluster).

  2. helm repo add netapp-trident https://netapp.github.io/trident-helm-chart

  3. helm install trident netapp-trident/trident-operator --version 23.04.0 --create-namespace --namespace trident

  4. Check the status of the installed CRDs, thetrident TridentOrchestrator object and the pods deployed:

    $ kubectl get crd | grep trident
    tridentorchestrators.trident.netapp.io                            2023-06-28T14:56:46Z
    
    $ kubectl -n trident get pods
    NAME                                 READY    STATUS    RESTARTS    AGE
    trident-operator-5789cf4777-nc4vn    1/1      Runnnig   0           7m32s
    
    $ kubectl -n trident get tridentorchestrators trident -o yaml
     […]
     status:
       message: 'Failed to install Trident; err: failed to patch Trident installation namespace
         trident; admission webhook "rancher.cattle.io.namespaces" denied the request:
         Unauthorized'
       namespace: trident
       status: Failed
       version: ""

Expected behavior

I expect it to deploy as it should and not crash. Here's an example of what it looks like when deploying successfully:

$ kubectl -n trident get pods
NAME                                  READY   STATUS    RESTARTS   AGE
trident-controller-6d7c9c5d8c-wg8zj   6/6     Running   0          4h28m
trident-node-linux-4tk6q              2/2     Running   0          4h28m
trident-node-linux-97rgx              2/2     Running   0          4h28m
trident-node-linux-9jfbh              2/2     Running   0          4h28m
trident-node-linux-btjx6              2/2     Running   0          4h28m
trident-node-linux-n5k75              2/2     Running   0          4h28m
trident-node-linux-vpcgd              2/2     Running   0          4h28m
trident-operator-5789cf4777-66mth     1/1     Running   0          4h29m

$ kubectl get crd | grep trident
tridentbackendconfigs.trident.netapp.io                           2023-07-05T08:09:56Z
tridentbackends.trident.netapp.io                                 2023-07-05T08:09:55Z
tridentmirrorrelationships.trident.netapp.io                      2023-07-05T08:10:00Z
tridentnodes.trident.netapp.io                                    2023-07-05T08:09:58Z
tridentorchestrators.trident.netapp.io                            2023-06-28T14:56:46Z
tridentsnapshotinfos.trident.netapp.io                            2023-07-05T08:09:56Z
tridentsnapshots.trident.netapp.io                                2023-07-05T08:09:59Z
tridentstorageclasses.trident.netapp.io                           2023-07-05T08:09:56Z
tridenttransactions.trident.netapp.io                             2023-07-05T08:09:59Z
tridentversions.trident.netapp.io                                 2023-07-05T08:09:55Z
tridentvolumepublications.trident.netapp.io                       2023-07-05T08:09:57Z
tridentvolumereferences.trident.netapp.io                         2023-07-05T08:10:00Z
tridentvolumes.trident.netapp.io                                  2023-07-05T08:09:57Z

Additional context

This was already reported to Rancher's GitHub page as issue #41191. People (understandably) thought that this was a bug in Rancher, while it's more of a documentation issue on their part (in my opinion).

There's also some information available in the operator's pod logs. I don't have them easily available right now, but it basically amounts to the same message as the one displayed by the TridentOrchestrator object anyway; it fails to patch the trident namespace because the Rancher admission webhook rancher.cattle.io.namespaces denied the request (Unauthorized).

Work-around

Inspired by this comment from the issue reported to Rancher's GitHub page, applying the following manifest and then restarting the operator fixes the issue:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: trident-operator-psa
rules:
- apiGroups:
  - management.cattle.io
  resources:
  - projects
  verbs:
  - updatepsa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: trident-operator-psa
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: trident-operator-psa
subjects:
- kind: ServiceAccount
  name: trident-operator
  namespace: trident

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions