-
Notifications
You must be signed in to change notification settings - Fork 245
Description
Describe the solution you'd like
We would like the trident operator to upgrade the Trident controller plugin without downtime.
Similar to #740, the trident operator deletes the deployment for the Trident controller plugin once when updating the trident version. It causes all the Trident functionality to be unavailable until the new controller pod becomes ready.
Furthermore, the deployment for the trident controller plugin has only one replica, and its strategy is Recreate
. So even after the trident operator would not delete the deployment, when the old pod failed to be deleted, the deployment controller does not create a new controller pod, causing all the Trident's functionality not to work.
Because the situation that we cannot delete pods (stuck in Terminating
state) is a common problem in Kubernetes, we would like to have multiple replicas of the Trident controller plugin with leader election.
Describe alternatives you've considered
none
Additional context
This situation can be reproduced with the following steps.
- Deploy the trident operator v22.01.1 with the TridentOrchestrator object.
- Wait until all trident pods become ready.
- Set a dummy finalizer to the Trident controller pod.
- e.g.
kubectl patch -n trident -p '{"metadata":{"finalizers": ["example.com/dummy"]}}' "$(kubectl get pods -n trident -l app=controller.csi.trident.netapp.io -o name | head -1)"
- This step simulates the controller plugin pod cannot be deleted.
- e.g.
- Update the trident operator and the TridentOrchestrator object to v22.04.0.
- There will be no healthy controller pod, which means all the Trident functionality does not work.
$ kubectl get pods -n trident -l app=controller.csi.trident.netapp.io
NAME READY STATUS RESTARTS AGE
trident-csi-ccc5cdd56-hkppj 0/6 Terminating 0 6m5s