-
Notifications
You must be signed in to change notification settings - Fork 60
DR-002-Infra: Integration Testing in a Distributed Monolith #1689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a design document for implementing integration testing in a distributed monolith architecture. The document outlines a strategy for coordinating testing across multiple repositories that ship together as a single system.
- Establishes workflows for single pull request testing, coordinated multi-repository changes, and post-merge full test suites
- Introduces manifest-based system composition using (component, commit) pairs to enable reproducible builds
- Provides GitHub Actions examples for implementing the integration testing workflows
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
The created documentation from the pull request is available at: docu-html |
Basically agreed in infrastructure planning session. Feel free to document your opinion here formally with a PR review! |
Overall, this reads very well. I'm in support of this approach. However, what I still don’t understand is how “coordinated multi-repo” works. Yes, I understand that the individual PRs shall be tagged with a common label. But how does the integration CI know that all PRs with that label are now present — more could still be added (or removed again, e.g. if I accidentally applied the wrong label and change it then); when is the changeset considered complete and the integration pipeline can run? Does it need a manual trigger? Do you use a time window of x minutes? Does the pipeline simply run for each individual PR in the coordinated set and simply fails n-1 times until the whole set (of n PRs) is there? And then it would be nice if one could merge the entire coordinated set automatically with "one click" (if everything is green in each component as well as in the integration). In any case, we should strive for a way that ensures everything belonging to the coordinated set gets merged and nothing is forgotten, so that inconsistencies and non-"integrability" don’t creep back in. Maybe if you merge in one repo, then all PRs in all other repos that belong to the coordinated set are merged too?! |
Good points. There are some ways to do this technically in GitHub. I know it for "enterprise", I suspect it for "public". I would use "PR with head branch & base branch the same" as unique identifier. It can be the same on all participating repositories, can even be created automatically. Example:
(very rough sketch, just to convey the idea)
This is very tricky. In fact, we are right now trying to implement something like that internally on GitHub Enterprise - using only GitHub methods. I cannot yet say that this works; it's like an atomic merge which involves a set of "the same series of PRs". Ideally a merge queue, so that long-running CI is not a problem. Maybe it requires a third-party tool (some GitHub Bot) or some additional requirement (e.g. you must use a singular integration repository). I can report when I have results here. (What I wrote here IMHO only expands on the proposal, should in no way contradict it. If it does, that is my lacking ability to explain my thoughts ;-)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of this. I have added minor remarks, but these are not critical.
Technicality: for reviews and other reasons, the one sentence per line rule is a very helpful one. But this is clearly only a suggestion.
--- | ||
## Executive Summary | ||
|
||
Large systems often span multiple repositories. Each repository can look “green” on its own, yet problems only show up when everything is combined. These late surprises slow down development and make debugging painful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large systems often span multiple repositories. Each repository can look “green” on its own, yet problems only show up when everything is combined. These late surprises slow down development and make debugging painful. | |
Large systems often span multiple repositories. Each repository can look “green” on its own, yet problems only show up when everything is combined. These late surprises slow down development and make debugging painful. They can even block releases. |
reproducibility) to a multi-repository boundary. The central integration repository is a | ||
neutral place to define participating components, build manifests, hold | ||
integration-specific helpers (overrides, fixtures, seam tests), and persist known-good | ||
records. It should not contain business logic; keeping it lean reduces accidental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that the integration repository can contain e.g. integration tests or other checks which make sense only in that specific integration context. Developing "integration tests" in another repository would be very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename "seam tests" to "integration tests"
So, if I understand you correctly, basically what you are saying is: It will fail n-1 times until the n-th required PR is in place. I mean I can somehow live with it, but appears to me to be a waste of resources. Maybe we can think of something that signals "this set of PRs is now complete and stable, now the integration pipeline will yield a sensible outcome". Ideally not (entirely) manual, because I still want automation as much as possible.
Exactly. I had this term in mind but hesitated to write it down, since I don't think it's truly achievable (being truly atomic). But yes, that's probably the best term to use to convey the idea. |
I think as long as the merge requires a passing CI (and you cannot "forget" to request this integration run) - should be fine!
Yes, I also doubt that truly atomic is possible. Best that is possible is something like "serializing the integration of groups of PRs via some queue", with "queue halts if there is a problem integrating one of the groups". And hoping that will not happen often. Maybe something is possible around "backing out failed integrations automatically" and "continuing with the next group" - but I can see already the complexity of that code exploding... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the last TL workshop we are fine with the concept. Lets see how it will be implemented ;)
DR-002 (Integration)
Large systems often span multiple repositories. Each repository can look “green” on its own, yet problems only show up when everything is combined. These late surprises slow down development and make debugging painful.
DR-002 turns a collection of separate repositories into a system that behaves like a single, continuously tested whole — ensuring the main line is always integrable across all components.
Proposed Approach
Benefits
Note: this concept is easily extendable to support multiple versions of S-CORE. But that's currently not required.
Rendered: https://eclipse-score.github.io/score/pr-1689/design_decisions/DR-002-infra.html