这是indexloc提供的服务,不要输入任何密码
Skip to content

Improve Kubernetes node upgrade process #680

@bderrly

Description

@bderrly

We have had several instances in the past month of node pool upgrades causing disruption for users of our production Humio cluster. The issue seems to be around the lack of graceful node removal and the lack of automation for bringing new Humio nodes up to date with digest and storage data.

A good example of what we think is required for graceful Kubernetes node upgrades can be seen from the Strimzi project. They have a utility named Drain Cleaner which monitors for pod eviction notices. Drain cleaner then applies a label to the affected Pod(s).

The Strimzi Kubernetes controller watches for this label and takes action to reschedule the node but only if it is safe for the cluster to do so; in particular it is concerned with ensuring there are enough in-sync replicas and that removing a particular broker will not reduce the ISR below minimums.

In the case of Humio, the appropriate action would be to schedule a new Humio Pod on a different Kubernetes node (this shouldn't require any particular work from the operator as the node should be cordoned as part of the upgrade process). Once the Humio node is healthy, the node should be given the same digest and storage partition assignments as the node to be evicted. Next, the process of transferring data can begin from the soon-to-be-evicted node to the new. Once this is done, the controller can gracefully remove the old node from the Humio cluster. This would prevent so many problems for us when there are (unexpected) node upgrades from GKE.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions