README for new MT Scheduler with pluggable policies #888

aavarghese · 2021-09-22T16:38:35Z

Signed-off-by: aavarghese avarghese@us.ibm.com

Continuation of #768

Proposed Changes

README

Release Note

Docs

aavarghese · 2021-09-22T16:40:33Z

/cc @lionelvillard

codecov · 2021-09-22T16:55:48Z

Codecov Report

Merging #888 (ea56516) into main (3f66360) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #888   +/-   ##
=======================================
  Coverage   75.01%   75.01%           
=======================================
  Files         152      152           
  Lines        7080     7080           
=======================================
  Hits         5311     5311           
  Misses       1485     1485           
  Partials      284      284

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f66360...ea56516. Read the comment docs.

lionelvillard · 2021-09-23T13:40:26Z

@aavarghese can you fix the linter errors? thx!

pierDipi

Thanks, I love this document!

pkg/common/scheduler/README.md

pierDipi · 2021-09-23T18:12:38Z

pkg/common/scheduler/README.md

+1. **Pod failure**:
+When a pod/replica in a StatefulSet goes down due to some reason (but its node and zone are healthy), a new replica is spun up by the StatefulSet with the same pod identit (pod can come up on a different node) almost immediately.
+All existing vreplica placements will still be valid and no rebalancing is needed.
+There shouldn’t be any degradation in Kafka message processing.


This is not really true, a consumer group rebalance could degrade message processing especially when Kafka Consumer Incremental Rebalance Protocol is not being used (which afaik is not implemented in Sarama).

@pierDipi the pod set being referred to here is only talking about the eventing scheduler adapter pods where vreplicas are placed. Since pod will restart, the same placements can be kept without a rebalancing of the vreps.
I agree with you about the consumer group rebalancing and degradation but that may/may not happen here if the kafka pods are affected, as well.
I hope I'm not missing anything...

@pierDipi the pod set being referred to here is only talking about the eventing scheduler adapter pods where vreplicas are placed. Since pod will restart, the same placements can be kept without a rebalancing of the vreps.

This is what All existing vreplica placements will still be valid and no rebalancing is needed. is saying, I agree and it's clear to me why.

I was referring to There shouldn’t be any degradation in Kafka message processing..

I agree with you about the consumer group rebalancing and degradation but that may/may not happen here if the kafka pods are affected, as well.
I hope I'm not missing anything...

so, are you saying that if a pod where vreplicas are placed goes down that won't trigger a consumer group rebalance that affects message processing?

In the worst-case scenario, I'd expect something like this to happen (happy to be wrong):

Pod goes down

A new pod comes up (same name)

Kafka broker sees a new consumer that wants to join the group -> rebalance

Kafka detects that the consumer that was consuming messages in the dead pod (1) is not sending heartbeats anymore -> rebalance (again)

at least one rebalance happen. 2 in the worst case since terminationGracePeriodSeconds = 0 < "time for Kafka to detect that a consumer is dead"

Is the above not possible? If yes, does that count as a degradation for Kafka message processing?

You're right. This is absolutely possible.

I made an assumption that the same sticky pod (when restarted) would have the same consumer member ID and using static membership, would get the same assignment or so.

I don't have any numbers to quantify the extent of degradation for these recovery scenarios. Will need to run some performance runs to measure latency. Thank you for catching this @pierDipi !!

Signed-off-by: aavarghese <avarghese@us.ibm.com>

pkg/common/scheduler/README.md

Signed-off-by: aavarghese <avarghese@us.ibm.com>

pkg/common/scheduler/README.md

Signed-off-by: aavarghese <avarghese@us.ibm.com>

lionelvillard · 2021-10-04T12:58:21Z

/approve
/lgtm

knative-prow-robot · 2021-10-04T12:58:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aavarghese, lionelvillard

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/common/scheduler/OWNERS~~ [lionelvillard]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Sep 22, 2021

knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 22, 2021

knative-prow-robot requested a review from lionelvillard September 22, 2021 16:40

aavarghese force-pushed the doc branch from 9c6d607 to add7b51 Compare September 23, 2021 18:17

pierDipi reviewed Sep 23, 2021

View reviewed changes

aavarghese requested a review from pierDipi September 27, 2021 15:47

aavarghese force-pushed the doc branch from 6d47918 to 6a5618d Compare September 29, 2021 14:02

README for new MT Scheduler with pluggable policies

9450dc8

Signed-off-by: aavarghese <avarghese@us.ibm.com>

aavarghese force-pushed the doc branch from 6a5618d to 9450dc8 Compare September 29, 2021 14:14

lionelvillard reviewed Sep 29, 2021

View reviewed changes

pkg/common/scheduler/README.md Show resolved Hide resolved

lionelvillard reviewed Sep 29, 2021

View reviewed changes

pkg/common/scheduler/README.md Outdated Show resolved Hide resolved

lionelvillard reviewed Sep 29, 2021

View reviewed changes

pkg/common/scheduler/README.md Outdated Show resolved Hide resolved

lionelvillard reviewed Sep 29, 2021

View reviewed changes

pkg/common/scheduler/README.md Outdated Show resolved Hide resolved

aavarghese force-pushed the doc branch 2 times, most recently from 3c4da3d to b49ece3 Compare September 30, 2021 15:05

Review comments

0839ec6

Signed-off-by: aavarghese <avarghese@us.ibm.com>

aavarghese force-pushed the doc branch from b49ece3 to 0839ec6 Compare September 30, 2021 15:06

lionelvillard reviewed Sep 30, 2021

View reviewed changes

pkg/common/scheduler/README.md Show resolved Hide resolved

Review comments

ea56516

Signed-off-by: aavarghese <avarghese@us.ibm.com>

aavarghese requested a review from lionelvillard October 1, 2021 13:53

knative-prow-robot assigned lionelvillard Oct 4, 2021

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2021

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2021

knative-prow-robot merged commit 7b363a2 into knative-extensions:main Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README for new MT Scheduler with pluggable policies #888

README for new MT Scheduler with pluggable policies #888

Uh oh!

aavarghese commented Sep 22, 2021

Uh oh!

aavarghese commented Sep 22, 2021

Uh oh!

codecov bot commented Sep 22, 2021 •

edited

Loading

Uh oh!

lionelvillard commented Sep 23, 2021

Uh oh!

pierDipi left a comment

Uh oh!

Uh oh!

pierDipi Sep 23, 2021

Uh oh!

aavarghese Sep 27, 2021

Uh oh!

pierDipi Sep 27, 2021

Uh oh!

aavarghese Sep 27, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lionelvillard commented Oct 4, 2021

Uh oh!

knative-prow-robot commented Oct 4, 2021

Uh oh!

Uh oh!

README for new MT Scheduler with pluggable policies #888

README for new MT Scheduler with pluggable policies #888

Uh oh!

Conversation

aavarghese commented Sep 22, 2021

Proposed Changes

Uh oh!

aavarghese commented Sep 22, 2021

Uh oh!

codecov bot commented Sep 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lionelvillard commented Sep 23, 2021

Uh oh!

pierDipi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pierDipi Sep 23, 2021

Choose a reason for hiding this comment

Uh oh!

aavarghese Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

pierDipi Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

aavarghese Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lionelvillard commented Oct 4, 2021

Uh oh!

knative-prow-robot commented Oct 4, 2021

Uh oh!

Uh oh!

codecov bot commented Sep 22, 2021 •

edited

Loading