这是indexloc提供的服务,不要输入任何密码
Skip to content

Probe properly after startup when periodSeconds greater than zero #11190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 24, 2021

Conversation

julz
Copy link
Member

@julz julz commented Apr 15, 2021

Fixes #10764.

WIP bc letting prow run the tests for me.

Aside: the obvious question is why we initially broke this. Reading through past github history I think the original intention may have been to only use the QP probing path for the aggressive probing case, and use a direct RP on the user container otherwise (in which case it would be safe to make the QP path no-op in that circumstance). Unfortunately it looks like we made the QP path a no-op when probePeriod>0, but never actually added a probe against the user container in this case.

I think moving back the readinessProbe to be directly against the user container when probePeriod>0 is a good idea (especially it will be when we get sub-second probePeriods so both startup and readiness could be on user container..), but that's going to take a bit more work and this is a small step that makes things better. (I don't think, now that we have startup and readiness split, that even when we move the probe to user container it will make sense to use isAggressive to determine whether to cache the probe: we just won't ever probe user container via QP in this case).

Release Note

User-supplied readinessProbes with a probePeriod set greater than zero are no longer silently ignored after pod startup.

@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2021
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Apr 15, 2021
@knative-prow-robot knative-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/autoscale area/networking labels Apr 15, 2021
@codecov
Copy link

codecov bot commented Apr 15, 2021

Codecov Report

Merging #11190 (3dc1386) into main (7f32cde) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head 3dc1386 differs from pull request most recent head e8e3baa. Consider uploading reports for the commit e8e3baa to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11190      +/-   ##
==========================================
- Coverage   87.81%   87.80%   -0.01%     
==========================================
  Files         196      196              
  Lines        9393     9391       -2     
==========================================
- Hits         8248     8246       -2     
- Misses        889      890       +1     
+ Partials      256      255       -1     
Impacted Files Coverage Δ
pkg/queue/health/handler.go 100.00% <ø> (ø)
pkg/queue/health/health_state.go 100.00% <ø> (ø)
pkg/activator/net/revision_backends.go 91.51% <0.00%> (-0.90%) ⬇️
pkg/autoscaler/statforwarder/leases.go 76.97% <0.00%> (+1.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f32cde...e8e3baa. Read the comment docs.

@julz
Copy link
Member Author

julz commented Apr 15, 2021

hm. looks unrelated, but also new :/

/test pull-knative-serving-istio-stable-no-mesh

@julz
Copy link
Member Author

julz commented Apr 15, 2021

hmmm happened again, but this doesn't change anything for the probePeriod=0 case, which the failing test is :/

/retest

@julz
Copy link
Member Author

julz commented Apr 15, 2021

🎉

/assign @markusthoemmes @vagababov

@julz julz changed the title WIP: Probe properly after startup when periodSeconds greater than zero Probe properly after startup when periodSeconds greater than zero Apr 15, 2021
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2021
@markusthoemmes
Copy link
Contributor

Looks good! I'm contemplating if we should have some E2E test for the behavior as our QP-probe-interaction seems kinda brittle. Would it make sense to write a test where a pod initially becomes ready and then subsequently stops serving and thus becomes unready? Might be useful to also make sure the changes you've planned remain sound?

@julz
Copy link
Member Author

julz commented Apr 16, 2021

Yeah good point @markusthoemmes: we're way lighter on e2e coverage for this stuff than we should be, especially given the number of moving parts involved. I actually have a test image from initially investigating this that should be fairly easy to use to improve the e2e coverage, added to my list to do

isAggressive: false,
wantStatus: http.StatusOK,
wantBody: queue.Name,
name: "alive: false, no prober, K-Probe",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can drop these K-Probe mentions anymore - since I believe those were for the aggressive state

@github-actions
Copy link

This Pull Request is stale because it has been open for 90 days with
no activity. It will automatically close after 30 more days of
inactivity. Reopen with /reopen. Mark as fresh by adding the
comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2021
@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/test-and-release It flags unit/e2e/conformance/perf test issues for product features approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 19, 2021
@julz
Copy link
Member Author

julz commented Aug 19, 2021

Rebased on #11814 and (since this makes it redundant) dropped periodSeconds>0 case (both cases now act the same \o/).

@julz
Copy link
Member Author

julz commented Aug 19, 2021

bump for another look @markusthoemmes @dprotaso

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2021
@dprotaso
Copy link
Member

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, julz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot merged commit ac53b04 into knative:main Aug 24, 2021
nealhu pushed a commit to nealhu/serving that referenced this pull request Aug 28, 2021
…ative#11190)

* Probe properly after startup when periodSeconds greater than zero

* Update comformance test for fixed behaviour

* Remove redundant K-Probe since this is no longer aggressive-probe specific
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale area/networking area/test-and-release It flags unit/e2e/conformance/perf test issues for product features cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

User-specified readiness probes with periodSeconds>0 should not be ignored
5 participants