-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Be more discerning about which errors cause mesh fall-back #11174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test pull-knative-serving-istio-stable-mesh |
Codecov Report
@@ Coverage Diff @@
## main #11174 +/- ##
==========================================
+ Coverage 87.73% 87.75% +0.02%
==========================================
Files 190 190
Lines 9171 9188 +17
==========================================
+ Hits 8046 8063 +17
+ Misses 871 870 -1
- Partials 254 255 +1
Continue to review full report at Codecov.
|
passing the failure through the |
0d06832
to
15649df
Compare
Previous to this commit, we would fall back to service scraping if all pods we tried to scrape errored. In many cases this heuristic is fine, but with a small number (e.g. one) of pods, it can lead to us falling back to service scraping unnecesarily. Once we've turned off direct scraping, we never turn it back on, so this is unfortunate. This commit tightens the check to ensure that all of the errors we saw were compatible with having been caused by the mesh being enabled.
15649df
to
34cec38
Compare
Should be good to go, but I force pushed so lets rerun the tests (which will fail again, unfortunately, but hopefully only in expected ways) for safety: /unhold /assign @markusthoemmes @vagababov PTAL :) |
looks like rolling deployment upgrade failure again FYI @markusthoemmes |
// TODO(julz) consider other tests, e.g. looking for upstream header in response. | ||
func isMeshError(err error) bool { | ||
var errWithStatus errorWithStatusCode | ||
if errors.As(err, &errWithStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're into code golfing
return errors.As(err, &errWithStatus) && errWithStatus.statusCode == meshErrorStatusCode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vagababov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-knative-serving-upgrade-tests |
Again 😢 /test pull-knative-serving-upgrade-tests |
@julz: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Another known flake. We really need to chase these down :(. /test pull-knative-serving-istio-stable-no-mesh |
Yay! finally |
Before this PR, we would fall back to service scraping if all pods we tried to scrape errored. In many cases this heuristic is fine, but with a small number (e.g. one) of pods, it can lead to us falling back to service scraping unnecessarily if an unrelated error occurs. Once we've turned off direct scraping, we never turn it back on, so this is unfortunate.
This PR tightens the check to ensure that all of the errors we saw were compatible with having been caused by the mesh being enabled before falling back to service scraping.
/hold for prow to tell me if this actually works with mesh, I might've guessed the wrong status code for mesh errors
Release Note