这是indexloc提供的服务,不要输入任何密码
Skip to content

After handling status update, reset update timer with correct duration #789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 12, 2019

Conversation

tghartland
Copy link
Contributor

If the ping timer is being used, it should be reset with the ping update
interval. If the status update interval is used then Ping stops being
called for long enough to cause kubernetes to mark the node as NotReady.

Fixes #788

Copy link
Contributor

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you can write a test to show the bug and the fix?

node/node.go Outdated
@@ -259,10 +259,13 @@ func (n *NodeController) controlLoop(ctx context.Context) error {
return nil
case updated := <-n.chStatusUpdate:
var t *time.Timer
var resetDuration time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can set this before we start the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this change in the new commit, please take a look.

Personally I think it was more obvious why the reset duration had to be changed when it was bundled with this

if n.disableLease {
t = pingTimer
} else {
t = statusTimer
}

logic.

Maybe selecting the active timer could be moved out of the loop as well?

@tghartland tghartland force-pushed the fix-notify-status-788 branch from 8180607 to fb6e9c6 Compare November 11, 2019 13:07
@tghartland
Copy link
Contributor Author

@cpuguy83 I've pushed a new commit with a test that shows this.

Without the fix (ping timer is reset to status timer interval):

--- FAIL: TestPingAfterStatusUpdate (0.20s)
    node_test.go:377: assertion failed: expression is false: testP.maxPingInterval < maxAllowedInterval: maximum time between node pings (63.134795ms) was greater than the maximum expected interval (25ms)
time="2019-11-11T14:06:04+01:00" level=debug msg="got node from api server"
FAIL

With the fix:

--- PASS: TestPingAfterStatusUpdate (0.20s)
PASS

@tghartland tghartland force-pushed the fix-notify-status-788 branch from fb6e9c6 to 8e79bb5 Compare November 11, 2019 13:23
If the ping timer is being used, it should be reset with the ping update
interval. If the status update interval is used then Ping stops being
called for long enough to cause kubernetes to mark the node as NotReady.
@tghartland tghartland force-pushed the fix-notify-status-788 branch from 8e79bb5 to c258614 Compare November 11, 2019 13:30
Copy link
Contributor

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Looking at this code again, I wonder if we should be resetting the ping timer at all for a node status update.
In any case this looks like a correct fix to the current situation.

Thanks!

@cpuguy83 cpuguy83 merged commit 1a9c4bf into virtual-kubelet:master Nov 12, 2019
@cpuguy83 cpuguy83 modified the milestones: 1.2.1, 1.1.1 Nov 12, 2019
@cpuguy83 cpuguy83 added the status/cherry-pick To be backported to older releases label Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/cherry-pick To be backported to older releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node goes to NotReady status a short time after using NotifyNodeStatus callback
2 participants