DR-002-Infra: Integration Testing in a Distributed Monolith #1689

AlexanderLanin · 2025-09-01T12:51:07Z

DR-002 (Integration)

Large systems often span multiple repositories. Each repository can look “green” on its own, yet problems only show up when everything is combined. These late surprises slow down development and make debugging painful.

DR-002 turns a collection of separate repositories into a system that behaves like a single, continuously tested whole — ensuring the main line is always integrable across all components.

Proposed Approach

Every change in any repository is tested in combination with the rest of the system, not just in isolation.
There are two testing layers:
- a fast feedback loop (lightweight tests that run on every pull request),
- and a deeper validation (heavier tests run after merges or on a schedule).
This setup guarantees that developers can trust the system as a whole to consistently work.

Benefits

Problems across repositories are caught early.
Developers spend less time coordinating merges (“merge after me” scenarios disappear).
The project always has a “known good” baseline to fall back on, enabling stability while still moving fast.

Note: this concept is easily extendable to support multiple versions of S-CORE. But that's currently not required.

Rendered: https://eclipse-score.github.io/score/pr-1689/design_decisions/DR-002-infra.html

Copilot

Pull Request Overview

This PR introduces a design document for implementing integration testing in a distributed monolith architecture. The document outlines a strategy for coordinating testing across multiple repositories that ship together as a single system.

Establishes workflows for single pull request testing, coordinated multi-repository changes, and post-merge full test suites
Introduces manifest-based system composition using (component, commit) pairs to enable reproducible builds
Provides GitHub Actions examples for implementing the integration testing workflows

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

docs/design_decisions/DR-002-infra.md

github-actions · 2025-09-01T18:59:03Z

The created documentation from the pull request is available at: docu-html

docs/design_decisions/DR-002-infra.md

AlexanderLanin · 2025-09-03T14:00:09Z

Basically agreed in infrastructure planning session. Feel free to document your opinion here formally with a PR review!

thilo-schmitt · 2025-09-04T07:01:58Z

Overall, this reads very well. I'm in support of this approach.

However, what I still don’t understand is how “coordinated multi-repo” works. Yes, I understand that the individual PRs shall be tagged with a common label. But how does the integration CI know that all PRs with that label are now present — more could still be added (or removed again, e.g. if I accidentally applied the wrong label and change it then); when is the changeset considered complete and the integration pipeline can run? Does it need a manual trigger? Do you use a time window of x minutes? Does the pipeline simply run for each individual PR in the coordinated set and simply fails n-1 times until the whole set (of n PRs) is there?

And then it would be nice if one could merge the entire coordinated set automatically with "one click" (if everything is green in each component as well as in the integration). In any case, we should strive for a way that ensures everything belonging to the coordinated set gets merged and nothing is forgotten, so that inconsistencies and non-"integrability" don’t creep back in. Maybe if you merge in one repo, then all PRs in all other repos that belong to the coordinated set are merged too?!

opajonk · 2025-09-04T11:42:56Z

Overall, this reads very well. I'm in support of this approach.

However, what I still don’t understand is how “coordinated multi-repo” works. Yes, I understand that the individual PRs shall be tagged with a common label. But how does the integration CI know that all PRs with that label are now present — more could still be added (or removed again, e.g. if I accidentally applied the wrong label and change it then); when is the changeset considered complete and the integration pipeline can run? Does it need a manual trigger? Do you use a time window of x minutes? Does the pipeline simply run for each individual PR in the coordinated set and simply fails n-1 times until the whole set (of n PRs) is there?

Good points. There are some ways to do this technically in GitHub. I know it for "enterprise", I suspect it for "public". I would use "PR with head branch & base branch the same" as unique identifier. It can be the same on all participating repositories, can even be created automatically. Example:

I open a PR on communication with HEAD foo and BASE main (draft as a start) and do some work.
automation creates a draft PR on reference_integration. Due to that PR, CI runs whatever is required there. Automation creates feedback to my PR on communication. That feedback is made a required check to merge to main.
It can now be that e.g. I need to open another PR with HEAD foo and BASE main on "baselibs" to fix a compilation / test failure on reference_integration. OK, exactly what I wanted.

(very rough sketch, just to convey the idea)

And then it would be nice if one could merge the entire coordinated set automatically with "one click" (if everything is green in each component as well as in the integration). In any case, we should strive for a way that ensures everything belonging to the coordinated set gets merged and nothing is forgotten, so that inconsistencies and non-"integrability" don’t creep back in. Maybe if you merge in one repo, then all PRs in all other repos that belong to the coordinated set are merged too?!

This is very tricky. In fact, we are right now trying to implement something like that internally on GitHub Enterprise - using only GitHub methods. I cannot yet say that this works; it's like an atomic merge which involves a set of "the same series of PRs". Ideally a merge queue, so that long-running CI is not a problem. Maybe it requires a third-party tool (some GitHub Bot) or some additional requirement (e.g. you must use a singular integration repository). I can report when I have results here.

(What I wrote here IMHO only expands on the proposal, should in no way contradict it. If it does, that is my lacking ability to explain my thoughts ;-))

opajonk

I am in favor of this. I have added minor remarks, but these are not critical.

Technicality: for reviews and other reasons, the one sentence per line rule is a very helpful one. But this is clearly only a suggestion.

opajonk · 2025-09-04T11:45:26Z

docs/design_decisions/DR-002-infra.md

+---
+## Executive Summary
+
+Large systems often span multiple repositories. Each repository can look “green” on its own, yet problems only show up when everything is combined. These late surprises slow down development and make debugging painful.


Suggested change

Large systems often span multiple repositories. Each repository can look “green” on its own, yet problems only show up when everything is combined. These late surprises slow down development and make debugging painful.

Large systems often span multiple repositories. Each repository can look “green” on its own, yet problems only show up when everything is combined. These late surprises slow down development and make debugging painful. They can even block releases.

opajonk · 2025-09-04T11:51:37Z

docs/design_decisions/DR-002-infra.md

+reproducibility) to a multi-repository boundary. The central integration repository is a
+neutral place to define participating components, build manifests, hold
+integration-specific helpers (overrides, fixtures, seam tests), and persist known-good
+records. It should not contain business logic; keeping it lean reduces accidental


I would argue that the integration repository can contain e.g. integration tests or other checks which make sense only in that specific integration context. Developing "integration tests" in another repository would be very confusing.

I'll rename "seam tests" to "integration tests"

thilo-schmitt · 2025-09-04T12:31:54Z

* I open a PR on communication with `HEAD` `foo` and `BASE` `main` (draft as a start) and do some work.

* automation creates a draft PR on `reference_integration`. Due to that PR, CI runs whatever is required there. Automation creates feedback to my PR on communication. That feedback is made a required check to merge to main.

* It can now be that e.g. I need to open another PR with `HEAD` `foo` and `BASE` `main` on "baselibs" to fix a compilation / test failure on `reference_integration`. OK, exactly what I wanted.

(very rough sketch, just to convey the idea)

So, if I understand you correctly, basically what you are saying is: It will fail n-1 times until the n-th required PR is in place. I mean I can somehow live with it, but appears to me to be a waste of resources.

Maybe we can think of something that signals "this set of PRs is now complete and stable, now the integration pipeline will yield a sensible outcome". Ideally not (entirely) manual, because I still want automation as much as possible.

it's like an atomic merge which involves a set of "the same series of PRs".

Exactly. I had this term in mind but hesitated to write it down, since I don't think it's truly achievable (being truly atomic). But yes, that's probably the best term to use to convey the idea.

opajonk · 2025-09-04T12:56:23Z

Maybe we can think of something that signals "this set of PRs is now complete and stable, now the integration pipeline will yield a sensible outcome". Ideally not (entirely) manual, because I still want automation as much as possible.

I think as long as the merge requires a passing CI (and you cannot "forget" to request this integration run) - should be fine!

it's like an atomic merge which involves a set of "the same series of PRs".

Exactly. I had this term in mind but hesitated to write it down, since I don't think it's truly achievable (being truly atomic). But yes, that's probably the best term to use to convey the idea.

Yes, I also doubt that truly atomic is possible. Best that is possible is something like "serializing the integration of groups of PRs via some queue", with "queue halts if there is a problem integrating one of the groups". And hoping that will not happen often. Maybe something is possible around "backing out failed integrations automatically" and "continuing with the next group" - but I can see already the complexity of that code exploding...

FScholPer

As discussed in the last TL workshop we are fine with the concept. Lets see how it will be implemented ;)

…score#1689)

DR-002

fdd8777

Copilot AI review requested due to automatic review settings September 1, 2025 12:51

AlexanderLanin requested review from MaximilianSoerenPollak and dcalavrezo-qorix as code owners September 1, 2025 12:51

Copilot AI reviewed Sep 1, 2025

View reviewed changes

docs/design_decisions/DR-002-infra.md Show resolved Hide resolved

AlexanderLanin added 2 commits September 1, 2025 18:53

add bazel chapter

ad8be63

switch from github preview to docs-as-code mode

735b473

This was referenced Sep 2, 2025

Define and agree workflow for changes over multiple bazel modules #1369

Closed

Define and agree workflow for changes in a single bazel module #1368

Closed

[I-Gate] Define repo structure & integration strategy #1542

Closed

Nightly executions #1327

Open

nradakovic approved these changes Sep 3, 2025

View reviewed changes

AlexanderLanin commented Sep 3, 2025

View reviewed changes

docs/design_decisions/DR-002-infra.md Outdated Show resolved Hide resolved

AlexanderLanin added 2 commits September 3, 2025 14:34

add executive summary

fa8b9c3

add visual separators between sections

1645946

opajonk approved these changes Sep 4, 2025

View reviewed changes

FScholPer approved these changes Sep 8, 2025

View reviewed changes

dcalavrezo-qorix approved these changes Sep 8, 2025

View reviewed changes

AlexanderLanin merged commit 5649cbf into eclipse-score:main Sep 8, 2025
6 checks passed

AlexanderLanin deleted the dr-002 branch September 8, 2025 14:06

masc2023 pushed a commit to esrlabs/score that referenced this pull request Oct 8, 2025

DR-002-Infra: Integration Testing in a Distributed Monolith (eclipse-…

c055cc6

…score#1689)

DR-002-Infra: Integration Testing in a Distributed Monolith #1689

DR-002-Infra: Integration Testing in a Distributed Monolith #1689

Uh oh!

Conversation

AlexanderLanin commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DR-002 (Integration)

Proposed Approach

Benefits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

Uh oh!

AlexanderLanin commented Sep 3, 2025

Uh oh!

thilo-schmitt commented Sep 4, 2025

Uh oh!

opajonk commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opajonk left a comment

Choose a reason for hiding this comment

Uh oh!

opajonk Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

opajonk Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

AlexanderLanin Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

thilo-schmitt commented Sep 4, 2025

Uh oh!

opajonk commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FScholPer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

AlexanderLanin commented Sep 1, 2025 •

edited

Loading

opajonk commented Sep 4, 2025 •

edited

Loading

opajonk commented Sep 4, 2025 •

edited

Loading