这是indexloc提供的服务,不要输入任何密码
Skip to content

Add endpoint call for pod freezer #11802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 22, 2021

Conversation

psschwei
Copy link
Contributor

@psschwei psschwei commented Aug 12, 2021

Fixes #11832

Proposed Changes

  • Add call to endpoint for when the request count for a pod scales to/from zero.

This is part of the work to add the Pod Freezer capability to knative (see #11694 for more details)

Note: there is a skeleton of a freeze/thaw service available here, which is a refactoring of julz's freeze-proxy daemon into a standalone service.

Release Note

 Add call to `concurrencyStateEndpoint` when the request count for a pod scales to/from zero.

@knative-prow-robot knative-prow-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 12, 2021
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Aug 12, 2021
@codecov
Copy link

codecov bot commented Aug 12, 2021

Codecov Report

Merging #11802 (48a86d4) into main (76cb92b) will decrease coverage by 0.07%.
The diff coverage is 55.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11802      +/-   ##
==========================================
- Coverage   87.54%   87.47%   -0.08%     
==========================================
  Files         196      196              
  Lines        9506     9515       +9     
==========================================
+ Hits         8322     8323       +1     
- Misses        912      919       +7     
- Partials      272      273       +1     
Impacted Files Coverage Δ
cmd/queue/main.go 0.50% <0.00%> (ø)
pkg/queue/concurrency_state.go 73.07% <60.00%> (-3.40%) ⬇️
pkg/http/handler/timeout.go 86.51% <0.00%> (-6.67%) ⬇️
pkg/activator/net/revision_backends.go 91.66% <0.00%> (-0.85%) ⬇️
pkg/apis/serving/fieldmask.go 95.00% <0.00%> (-0.06%) ⬇️
pkg/activator/net/throttler.go 88.85% <0.00%> (ø)
pkg/apis/serving/v1/revision_types.go 100.00% <0.00%> (ø)
pkg/autoscaler/metrics/stats_scraper.go 92.21% <0.00%> (ø)
pkg/reconciler/revision/revision.go 96.61% <0.00%> (+8.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76cb92b...48a86d4. Read the comment docs.

// runs the `resume` function. If either of `pause` or `resume` are not passed, it runs
// the respective local function(s). The local functions are the expected behavior; the
// function parameters are enabled primarily for testing purposes.
func ConcurrencyStateHandler(endpoint string, h http.Handler, logger *zap.SugaredLogger, pause, resume func()) http.HandlerFunc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best to keep logger as first param for consistency

func ConcurrencyStateHandler(endpoint string, h http.Handler, logger *zap.SugaredLogger, pause, resume func()) http.HandlerFunc {
logger.Info("Concurrency state tracking enabled")

if pause == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd drop this nil check, if someone is explicitly passing in a nil function here they deserve a panic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought here was to have default pause/resume functions, so a nil value here would mean to use the default. But maybe it's better to explicitly include the pause/resume functions in the call from queue/main.go ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'd pull the functions out and explicitly inject them. You're going to need to do this I think to comfortably test them anyway (which we should do in this PR).

logger.Info("Concurrency state tracking enabled")

if pause == nil {
pause = func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably pull this out to a separate top-level function or interface implementation rather than defining the default inline here. That should also be a lot easier to test (which we should do :)).

if pause == nil {
pause = func() {
logger.Info("Requests dropped to zero, pausing pod ...")
pauseEndpoint := endpoint + ":9696/freeze"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we hard-code the port and path here or let the endpoint env var define that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Port I think is fine to include in the envvar. If we include path in the envvar, then I think we'd need to pass freeze/resume as part of the request (or use two endpoints, but I don't think that's a good idea).

Not sure which option makes more sense (keeping path hard coded or adding the action to the request) makes more sense. In either case, I think we'll ultimately have to define what is required from any custom "pod freezer" service that gets called out to.

Copy link
Member

@julz julz Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add the path then I think we probably want to be more generic than freeze and thaw. Another option though would be to use a single endpoint and pass a json structure to it (so it would be the same endpoint either way, but the json structure would define whether the event was to-zero or from-zero).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The more I think about it, a single endpoint with the action taken in the request itself seems like the right path to take.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julz Another thought -- if the path is in the envvar, then we don't necessarily need separate pause/resume functions. We could use a single function and pass the JSON as a parameter.

If that approach makes sense, then perhaps I should update the function signature in #11783 before it merges so that it stays consistent...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may still want it as separate functions to make it easy to test (otherwise you'd have to assert on the json being sent), but we can always refactor as we play with this more. We could also have a single function with a boolean or something. Probably adding some unit tests would be helpful at this point anyway, and then I can stop guessing about what will or won't make things more testable ;).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was writing the tests that triggered my thought (and also led me to the same conclusion you have, that it's easier to test with separate functions) :)

req.Header.Add("Token", "") // TODO: use serviceaccountToken from projected volume
resp, err := http.DefaultClient.Do(req)
if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think through these failure cases, a panic doesn't seem good enough here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud a bit here...

If the resume call fails, in an ideal state I think I'd want to log the failure and then retry at least one more time (perhaps it was just due to a network blip?). If still unable to resume the container, then I think I'd want to launch another pod and send the request to it (though not sure how feasible that would be). The key here is to get an active pod back up.

If the case where the pause call fails, I think that logging the failure and retrying might be sufficient. A not-paused paused container doesn't really negatively impact the end-user (their service would've scaled down anyways). That said, an alternative might be to kill the pod and respawn it in a paused state (which might be a better choice from a provider perspective).

@julz what's your take?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my instinct is let's spin out an issue for retrying, maybe another for more generally thinking through what should happen here (e.g. should we fail readiness/liveness probes maybe so that k8s routes around us/kills us and brings up another container?), and then for right now land this with a panic (since it's an experimental branch throwing our hands in the air is probably better than logging, and recovering needs more thought - but I would like an issue to exist for at least retrying).

@markusthoemmes any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've reverted to the panic, but kept the log message as I think it's still useful info to have)

// the respective local function(s). The local functions are the expected behavior; the
// function parameters are enabled primarily for testing purposes.
func ConcurrencyStateHandler(endpoint string, h http.Handler, logger *zap.SugaredLogger, pause, resume func()) http.HandlerFunc {
logger.Info("Concurrency state tracking enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this duplicates the log message on line 302 of main.go, I don't think we really need to print this at Info level twice.

// Resume sends to an endpoint when request counts increase from zero
func Resume(endpoint string) error {
action := ConcurrencyStateMessageBody{Action: "resume"}
bodyText := []byte(`{"action": "` + action.Action + `"}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we're building this with string concatenation when we've created a struct with json tags on the line above? Should this be json.Marshal(action) instead?

logger.Info("Requests dropped to zero")
err := pause(endpoint)
if err != nil {
logger.Errorf("Issue handling pause request: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not convinced this is better than the panic tbh. We need to think about how we actually want to behave here. Should we retry the request? Should we exit the entire server? Panic may be all we can do, but I think we should decide explicitly if that's the case. This might be something to talk through on Wednesday / in an ad-hoc meeting actually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I sketched out some thoughts here. This was a start along those lines, but didn't want to go too far before we discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops missed that comment, let me go read now!

logger.Info("Requests increased from zero")
err := resume(endpoint)
if err != nil {
logger.Errorf("Issue handling resume request: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here, if we fail to resume and then send a request to the paused pod anyway that's not going to end well obviously. A panic is better than that, but we should be clear if we want to try retrying instead, or maybe the pod should fail a readiness/liveness check or something so k8s will replace it? (Some of this might well be follow-on PRs, just thinking out loud and making sure we don't lose this thread because I think it's important - and just logging definitely seems wrong I think).

// runs the `resume` function. If either of `pause` or `resume` are not passed, it runs
// the respective local function(s). The local functions are the expected behavior; the
// function parameters are enabled primarily for testing purposes.
func ConcurrencyStateHandler(h http.Handler, logger *zap.SugaredLogger, pause, resume func(string) error, endpoint string) http.HandlerFunc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason it is a public function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be public to be accessible to the buildServer function.

// the respective local function(s). The local functions are the expected behavior; the
// function parameters are enabled primarily for testing purposes.
func ConcurrencyStateHandler(h http.Handler, logger *zap.SugaredLogger, pause, resume func(string) error, endpoint string) http.HandlerFunc {
logger.Info("Concurrency state endpoint set, tracking request counts, using endpoint ", endpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Info("Concurrency state endpoint set, tracking request counts, using endpoint ", endpoint)
logger.Info("Concurrency state endpoint set, tracking request counts, using endpoint: ", endpoint)

logger.Info("Requests dropped to zero")
err := pause(endpoint)
if err != nil {
logger.Errorf("Issue handling pause request: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Errorf("Issue handling pause request: %s", err)
logger.Errorf("Error handling pause request: %v", err)

logger.Info("Requests increased from zero")
err := resume(endpoint)
if err != nil {
logger.Errorf("Issue handling resume request: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Errorf("Issue handling resume request: %s", err)
logger.Errorf("Error handling resume request: %v", err)

timeout := time.Duration(env.RevisionTimeoutSeconds) * time.Second

// Create queue handler chain.
// Note: innermost handlers are specified first, ie. the last handler in the chain will be executed first.
var composedHandler http.Handler = httpProxy
if concurrencyStateEnabled {
Copy link
Member

@jwcesign jwcesign Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the config string don't need to be "freeze-proxy"? according to: https://github.com/knative/serving/pull/11699/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the config string should be the service endpoint of the freeze service and its port. For example, assuming there was a freeze-service running in the knative-serving namespace using port 9696, the config value could be: freeze-service.knative-serving.svc.cluster.local:9696.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2021
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2021
@psschwei psschwei changed the title [WIP] Add endpoint call for pod freezer Add endpoint call for pod freezer Sep 1, 2021
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2021
@psschwei
Copy link
Contributor Author

psschwei commented Sep 1, 2021

@julz @markusthoemmes @vagababov think this one is ready to go now.

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on getting this a bit more DRY and maybe layered slightly differently.

logger.Info("Requests dropped to zero")
if err := pause(endpoint); err != nil {
logger.Errorf("Error handling pause request: %v", err)
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this panic crash the entire pod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on some local testing, I believe it only kills the queue-proxy container. The user-container container doesn't seem to be impacted. Here's some kubectl output when a panic occurs:

$ k get po -w
NAME                                        READY   STATUS    RESTARTS   AGE
example-00001-deployment-598f4f7b76-7rngf   2/2     Running   0          107s
example-00001-deployment-598f4f7b76-7rngf   1/2     Error     0          111s
example-00001-deployment-598f4f7b76-7rngf   1/2     Running   1          112s
example-00001-deployment-598f4f7b76-7rngf   1/2     Running   1          113s
example-00001-deployment-598f4f7b76-7rngf   2/2     Running   1          114s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fine. I guess we'll want the entire pod to be recreated if that happens?

Copy link
Contributor Author

@psschwei psschwei Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Right now, the panic is just a place holder (see @julz 's comment here), but we'll be implementing something more robust shortly via #11834

Comment on lines 94 to 98
action := ConcurrencyStateMessageBody{Action: "pause"}
bodyText, err := json.Marshal(action)
if err != nil {
return fmt.Errorf("unable to create request body: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is static content anyway, can we cache it? Seems wasteful to marshal this on each request (same below)

Comment on lines 117 to 118
func Resume(endpoint string) error {
action := ConcurrencyStateMessageBody{Action: "resume"}
bodyText, err := json.Marshal(action)
if err != nil {
return fmt.Errorf("unable to create request body: %w", err)
}
body := bytes.NewBuffer(bodyText)
req, err := http.NewRequest(http.MethodPost, endpoint, body)
if err != nil {
return fmt.Errorf("unable to create request: %w", err)
}
req.Header.Add("Token", "nil") // TODO: use serviceaccountToken from projected volume
resp, err := http.DefaultClient.Do(req)
if err != nil {
return fmt.Errorf("unable to post request: %w", err)
}
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("expected 200 response, got: %d: %s", resp.StatusCode, resp.Status)
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the exact same function as above modulo Action: "resume". Shall we deduplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I recall discussing in one of the other PRs keeping the pause/resume functions separate for testing purposes, but no need for identical functions. I can dedupe to a single function (say func concurrencyStateRequest(b ConcurrencyStateMessageBody) error {} (removing endpoint as per comment below).

One question -- would it still be worthwhile to keep distinct pause/resume functions (where each would just call the concurrencyStateRequest function with the relevant parameter? I could see it help with readability, but I may be overthinking things a bit too...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's fine to have both as readability wrappers IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I could also see the form ultimately being:

In main.go

resume := concurrencyStateRequest(endpoint, ConcurrencyStateMessageBody{Action: "resume"})
pause := concurrencyStateRequest(endpoint, ConcurrencyStateMessageBody{Action: "pause"})
ConcurrencyStateHandler(logger, handler, resume, pause)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I misread the comment below... yes, passing the endpoint directly to the resume/pause functions (and skipping the Handler function entirely) makes total sense. Thanks for the clarification!

@@ -85,3 +88,55 @@ func ConcurrencyStateHandler(logger *zap.SugaredLogger, h http.Handler, pause, r
<-done
}
}

// Pause sends to an endpoint when request counts drop to zero
func Pause(endpoint string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layering with passing endpoint in here seems a little odd. It feels like it should be captured in the functions passed to ConcurrencyStateHandler so it itself wouldn't care what to pass to these functions.

bodyText, err := json.Marshal(action)
if err != nil {
return fmt.Errorf("unable to create request body: %w", err)
panic(err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% happy with a panic here, so if there's a better way to go I'm all ears

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this, find a approach to relaunch a pod

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2021
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 13, 2021
Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

logger.Info("Requests increased from zero")
if err := resume(); err != nil {
logger.Errorf("Error handling resume request: %v", err)
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panic won't actually do anything anymore as it's now situated directly in ServeHTTP where panics are usually caught by the wrapping server. We might need an explicit os.Exit(1) here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you +1ed this comment but forgot to do it @psschwei :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, looks like I did it on the one above, but not here 🤦

paused = false
logger.Info("From-Zero request successfully processed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug?

paused = true
logger.Info("To-Zero request successfully processed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug?

Comment on lines 303 to 310
pauseMsg, err := queue.CreateConcurrencyStateMessageBody(queue.ConcurrencyStateMessageBody{Action: "pause"})
if err != nil {
logger.Errorf("unable to create concurrency state pause message body: %s", err)
}
resumeMsg, err := queue.CreateConcurrencyStateMessageBody(queue.ConcurrencyStateMessageBody{Action: "resume"})
if err != nil {
logger.Errorf("unable to create concurrency state resume message body: %s", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite redundant. Why not pass a ConcurrencyStateMessageBody to ConcurrencyStateRequest and marshal in the function before returning the closure? Seems better API design too, since ConcurrencyStateRequest taking []byte is a bit more generic than seems healthy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally was passing a ConcurrencyStateMessageBody (though without a separate function for marshalling), and for some reason then the resume call would only work once (on any subsequent call, it was throwing a 400 error). Now, however, using CreateConcurrencyStateMessageBody within ConcurrencyStateRequest works fine. 🤷 Updating now.

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 14, 2021
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 14, 2021
Comment on lines 122 to 124
func CreateConcurrencyStateMessageBody(action ConcurrencyStateMessageBody) ([]byte, error) {
return json.Marshal(action)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this, I think.

Comment on lines 101 to 102
_ = fmt.Errorf("unable to create message body: %w", err)
os.Exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does what you think it does. It won't log the message for instance.
I think we should surface the error through the return and logger.Fatal where this function is called.

Copy link
Contributor Author

@psschwei psschwei Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a silly question, but given that this function returns a func() error, how would we return just the error here? edit: yes, it was a silly question 🤦

@psschwei
Copy link
Contributor Author

Failing tests were from pkg/http/handler which this PR didn't touch, so assume those are flakes.

/test pull-knative-serving-unit-tests

logger.Info("Requests increased from zero")
if err := resume(); err != nil {
logger.Errorf("Error handling resume request: %v", err)
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you +1ed this comment but forgot to do it @psschwei :)

mux.Unlock()

h.ServeHTTP(w, r)
}
}

// ConcurrencyState Request sends to the concurrency state endpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ConcurrencyState Request sends to the concurrency state endpoint
// ConcurrencyStateRequest sends a request to the concurrency state endpoint.

}, nil
}

type ConcurrencyStateMessageBody struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total nit, but I don't think the Body suffix is useful here. There's no metadata section or anything going on, so just ConcurrencyStateMessage seems fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actually see below for a suggestion involving just dropping this struct anyway

mux.Unlock()

h.ServeHTTP(w, r)
}
}

// ConcurrencyState Request sends to the concurrency state endpoint
func ConcurrencyStateRequest(endpoint string, action ConcurrencyStateMessageBody) (func() error, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at this, I wonder if it's really worth exporting a struct for this parameter, or if we could just accept a string with the action to send? That would also let us drop the the json marshalling (and therefore the error return, and the struct).

if we did this, then we could also maybe do something like:

func Pause(endpoint string) func() error {
  return concurrencyStateRequest(endpoint, "pause")
}

func Resume(endpoint string) func() error {
  return concurrencyStateRequest(endpoint, "resume")
}

func concurrencyStateRequest(endpoint, action string) (func() error, error) {
  body := fmt.Sprintf(`{ "action": %q }`, action)
  ... 

If we do want to have a struct, I think I'd put the methods on it rather than passing it in, which would let us drop the closures:

type ConcurrencyEndpoint struct { endpoint string }

func (c ConcurrencyEndpoint) Pause() error {
 ...
}

func (c ConcurrencyEndpoint) Resume() error {
 ...
}

endpoint := queue.ConcurrencyEndpoint(endpoint: address}
composedHandler = queue.ConcurrencyStateHandler(logger, composedHandler, endpoint.Pause, endpoint.Resume))

(that could then easily become a PauseResumer interface, but maybe in a future PR)

Copy link
Contributor Author

@psschwei psschwei Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll probably want to send other info along with the freeze/pause message (pod name, for one; if for nothing else than debugging, it'd be good to know which pod is being paused and which is being resumed. Was thinking of that as a separate PR, since this is already quite large). For those reasons, I'd lean towards keeping the struct.

Putting the methods on the struct makes sense.

edit: Although as I'm thinking through it more, we should be able to add multiple parameters in the string too, so maybe the simpler, string-based approach is the better one.

mux.Unlock()

h.ServeHTTP(w, r)
}
}

// ConcurrencyState Request sends a request to the concurrency state endpoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ConcurrencyState Request sends a request to the concurrency state endpoint.
// concurrencyStateRequest sends a request to the concurrency state endpoint.

Copy link
Member

@julz julz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 22, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: julz, psschwei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2021
@knative-prow-robot knative-prow-robot merged commit cc5ff9b into knative:main Sep 22, 2021
@psschwei psschwei deleted the freeze-endpoint branch October 15, 2021 15:17
@ediezh
Copy link

ediezh commented Oct 21, 2021

Are there documentation/examples on how to use this feature?

@psschwei
Copy link
Contributor Author

For now, the setup process is documented in the README for the container-freezer sandbox repo. We're still in the process of getting documentation into the official docs.

There's also a tracking doc that gives a little more detail about what the freezer is doing and why.

Also, one thing to note is that the freezing capability is still very much a work in progress at the moment (for example, the code that actually does the freezing hasn't merged to the main branch just yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale area/networking cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add call to endpoint [Pod Freezer]
7 participants