-
Notifications
You must be signed in to change notification settings - Fork 635
Fix race between k8s and provider when deleting pod #908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the wrong approach because it still allows for a race The deletePodHandler can read from the knownPods map, and get the pod UID / (and resource version?) from there.
In that, it can populate DeleteField's Precondition:
https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#DeleteOptions
In that, you can specify a UID precondition:
https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#Preconditions
Thanks for addressing this work!
P.P.S. There is a slight problem introduced, in that in AddFunc we get a duplicate pod in known pods potentially because a pod with the same key has been added. Perhaps we introduce another syncMap of deleting pods that things are transitioned to? Even at that, I can imagine this being confusing. |
e6db565
to
d8f51ad
Compare
Thanks for your suggestions. Done ! PTAL |
1b7cf97
to
19de33e
Compare
@cwdsuzhou I made a comment on how I think this can be simpler. |
19de33e
to
47fd811
Compare
@cpuguy83 PTAL thanks |
47fd811
to
c22f4ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, implement at least one e2e test that proves this fix.
Thanks! IMO this may be hard to add e2e tests since there are chances that the pod controller works correctly. Will have a try |
d1b7d8a
to
b4da008
Compare
Add some unit tests |
0da1d55
to
7549cb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd0e293
to
876a7c5
Compare
@sargun PR addressed |
I plan to review this over the weekend. |
I didn't get the chance to review, I'm sorry. Will try again during the week or weekend. |
Hi @pires could you help review this PR this week? thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @sargun initial review and I ask for the removal of code that injects/reads Pod UID into queue keys and instead simply pass the appropriate DeleteOptions
with a Precondition
that ensures the Pod UID is set.
UPDATE: this review is flawed because there's no way to actually get the Pod UID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like most of this change but it does seems to fix this particular issue, and, right now, I can't think of any solution that doesn't require some deeper thinking into how we handle Pod events. And I do think that it will take some time to do that, including coming up with a design that covers as many scenarios as we can come up with, before any execution is taken on.
897cd38
to
3c59bcc
Compare
Thanks, this is not the best solution, but this can solve the issue without changing the vk designation too much. |
3c59bcc
to
3ff1694
Compare
@sargun implicitly LGTM'ed and I have done it explicitly so I'm merging. |
Fixes #867
Fixes #907