+
Skip to content

Provide CLI Parameters for jgroups.* options #40690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 7, 2025

Conversation

ryanemerson
Copy link
Contributor

Closes #40481

@ryanemerson ryanemerson changed the title 40481/jgroups cli args Provide CLI Parameters for jgroups.* options Jun 24, 2025
@ryanemerson ryanemerson force-pushed the 40481/jgroups_cli_args branch 4 times, most recently from 9ddc2c0 to e91d056 Compare June 25, 2025 10:06
@keycloak-github-bot
Copy link

Unreported flaky test detected

If the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.cluster.UserFederationInvalidationClusterTest#crudWithFailover

Keycloak CI - Clustering IT

java.lang.RuntimeException: java.lang.IllegalStateException: Keycloak unexpectedly died :(
	at org.keycloak.testsuite.arquillian.containers.KeycloakQuarkusServerDeployableContainer.start(KeycloakQuarkusServerDeployableContainer.java:71)
	at org.jboss.arquillian.container.impl.ContainerImpl.start(ContainerImpl.java:185)
	at org.jboss.arquillian.container.impl.client.container.ContainerLifecycleController$8.perform(ContainerLifecycleController.java:137)
	at org.jboss.arquillian.container.impl.client.container.ContainerLifecycleController$8.perform(ContainerLifecycleController.java:133)
...

Report flaky test

Copy link

@keycloak-github-bot keycloak-github-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@ryanemerson ryanemerson force-pushed the 40481/jgroups_cli_args branch from e91d056 to 0d591da Compare June 25, 2025 11:16
@ryanemerson ryanemerson marked this pull request as ready for review June 25, 2025 13:22
@ryanemerson ryanemerson requested review from a team as code owners June 25, 2025 13:22
@ryanemerson ryanemerson requested a review from pruivo June 25, 2025 13:22
Copy link
Contributor

@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a couple of things.

@@ -98,6 +103,24 @@ public static PropertyMapper<?>[] getClusteringPropertyMappers() {
.isEnabled(() -> Configuration.isTrue(CachingOptions.CACHE_EMBEDDED_MTLS_ENABLED), "property '%s' is enabled".formatted(CachingOptions.CACHE_EMBEDDED_MTLS_ENABLED.getKey()))
.validator(CachingPropertyMappers::validateCertificateRotationIsPositive)
.build(),
fromOption(CachingOptions.CACHE_EMBEDDED_NETWORK_BIND_ADDRESS)
.paramLabel("address")
.to("kc.spi-cache--embedded--default--network-bind-address")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.to("kc.spi-cache--embedded--default--network-bind-address")
.to("kc.spi-cache-embedded--default--network-bind-port")

The spi is cache-embedded right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the SPI is cacheEmbedded - but note that this should still be ...network-bind-address not ...-port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to use cache-embedded for the various options.

boolean bindAddressRequired = Optional.ofNullable(containerBuilder.getArgs())
.orElse(List.of())
.stream()
.noneMatch(a -> a.contains("--cache-embedded-network-bind-address"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only match if someone used the unsupported podTemplate to update the main container's args. Do you mean be checking specifically for this, or do you want to check for additionalOptions usage of cache-embedded-network-bind-address?

Copy link
Contributor Author

@ryanemerson ryanemerson Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had forgot about additionalOptions. I have updated the logic so that we only set the bind-address explicitly if it's not already specified via additionalOptions or a custom command via unsupported podTemplate.

I have added test cases to operator/src/test/java/org/keycloak/operator/testsuite/unit/PodTemplateTest.java.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user specified bind address doesn't make sense in the context of the Operator, as it will result in the same address being used by all pods, therefore I have reverted ^ in favour of Pedro's suggestion: #40690 (comment)

Comment on lines 165 to 166
String[] split = key.split("(?=[A-Z])", 3);
String property = "jgroups.%s.%s".formatted(split[1].toLowerCase(), split[2].toLowerCase());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will work, but is hard to understand for me, and might fail with future changes. Could we have a switch statement instead which those values that we support? If you want to make it fancy, try using enums to make it safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, looking at this again I realise there's actually a bug as the external address properties are in the format jgroups.%s_%s not jgroups.%s.%s.

I have reworked to utilise an Enum.

Comment on lines 39 to 40
--cache-embedded-network-bind-address <address>
IP address used by clustering transport.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extends the docs:

Suggested change
--cache-embedded-network-bind-address <address>
IP address used by clustering transport.
--cache-embedded-network-bind-address <address>
IP address to bind to for the clustering transport. By default, ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a commit updating the various descriptions.

Comment on lines 41 to 42
--cache-embedded-network-bind-port <port>
Port used by clustering transport.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend the docs. People should understand that this is usually optional (same for --cache-embedded-network-bind-address)

Suggested change
--cache-embedded-network-bind-port <port>
Port used by clustering transport.
--cache-embedded-network-bind-port <port>
Port to bind to for the clustering transport. By default, ...

Comment on lines 43 to 44
--cache-embedded-network-external-address <address>
IP address that other instances in the cluster should use to contact this node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--cache-embedded-network-external-address <address>
IP address that other instances in the cluster should use to contact this node.
--cache-embedded-network-external-address <address>
IP address that other instances in the cluster should use to contact this node. Set only it is different from `cache-embedded-network-bind-address`, for example when this instance is behind a firewall.

Comment on lines 45 to 46
--cache-embedded-network-external-port <port>
Port that other instances in the cluster should use to contact this node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--cache-embedded-network-external-port <port>
Port that other instances in the cluster should use to contact this node.
--cache-embedded-network-external-port <port>
Port that other instances in the cluster should use to contact this node. Set only if ...

fromOption(CachingOptions.CACHE_EMBEDDED_NETWORK_BIND_ADDRESS)
.paramLabel("address")
.to("kc.spi-cache-embedded--default--network-bind-address")
.validator(CachingPropertyMappers::validateBindAddress)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add isEnabled to all the mapping
.isEnabled(CachingPropertyMappers::cacheSetToInfinispan, "Infinispan clustered embedded is enabled")

Comment on lines 283 to 300
private static void validateBindAddress(String address) {
if (InetAddressUtils.isIPv4Address(address) || InetAddressUtils.isIPv6Address(address))
return;

for (Util.AddressScope addressScope : Util.AddressScope.values())
if (addressScope.name().equals(address))
return;

String matchType = address.split(":")[0];
if (Global.MATCH_ADDR.equals(matchType) || Global.MATCH_HOST.equals(matchType) || Global.MATCH_INTF.equals(matchType))
return;

throw new PropertyException("Option '%s'. Invalid address: '%s'".formatted(CachingOptions.CACHE_EMBEDDED_NETWORK_BIND_ADDRESS.getKey(), address));
}

private static void validateExternalAddress(String address) {
if (!InetAddressUtils.isIPv4Address(address) && !InetAddressUtils.isIPv6Address(address))
throw new PropertyException("Option '%s'. Invalid address: '%s'".formatted(CachingOptions.CACHE_EMBEDDED_NETWORK_EXTERNAL_ADDRESS.getKey(), address));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do any validation in Keycloak as it restricts what can be set. For example, you cannot put the network interface, but -Djgroups.bind.address=eth0 or -Djgroups.bind.address=localhost are both valid.

EXTERNAL_PORT(CachingOptions.CACHE_EMBEDDED_NETWORK_EXTERNAL_PORT, "jgroups.external_port");

final Option<?> option;
final String property;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a list. jgroups.bind.address is an Infinispan property, but public static final String BIND_ADDR="jgroups.bind_addr"; needs to be set too. Same for the bind port.
Using a custom JGroups configuration file won't be able to read the Infinispan properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me and Pedro discussed this and the jgroups.bind_addr system property always takes precedence over the Infinispan property jgroups.bind.address and any values configured in the JGroup's stack, so it's acceptable to just set the jgroups.bind_addr property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also made it so that we warn if either jgroups.bind_addr or jgroups.bind.address is set with the CLI option.

Comment on lines 325 to 328
if (isBindAddressRequired(containerBuilder.getArgs(), keycloakCR.getSpec().getAdditionalOptions())) {
// Set bind address as this is required for JGroups to form a cluster in IPv6 environments
containerBuilder.addToArgs("--cache-embedded-network-bind-address=$(%s)".formatted(POD_IP));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing to me... why would anyone set the bind address in the Keycloak CR!?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this line and change the environment variable name in line:

envVars.add(new EnvVarBuilder().withName(POD_IP).withNewValueFrom().withNewFieldRef()

i.e. replace POD_ID with KC_CACHE_EMBEDDED_NETWORK_BIND_ADDRESS

@ryanemerson ryanemerson force-pushed the 40481/jgroups_cli_args branch from 551febd to 4996c08 Compare June 26, 2025 14:38
@ryanemerson ryanemerson requested a review from shawkins June 26, 2025 17:50
@@ -534,7 +532,7 @@ private List<EnvVar> getDefaultAndAdditionalEnvVars(Keycloak keycloakCR) {
}
}

envVars.add(new EnvVarBuilder().withName(POD_IP).withNewValueFrom().withNewFieldRef()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small chance this could be seen as a breaking change - if users assume that the POD_ID environment variable is always available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue this is a notable change, as it changes an internal behavior. We have never stated that the Operator will put that to the environment variables. So as long as we document it as a notable change, we should be fine.

@@ -83,7 +83,7 @@
)
public class KeycloakDeploymentDependentResource extends CRUDKubernetesDependentResource<StatefulSet, Keycloak> {

public static final String POD_IP = "POD_IP";
public static final String BIND_ADDRESS = "KC_CACHE_EMBEDDED_NETWORK_BIND_ADDRESS";
Copy link
Contributor

@shawkins shawkins Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation for users who are specifying an image - won't there be an incompatibility with existing images with the new operator as this option won't yet exist?

I know we don't go into detail about this scenario in the general docs, but for the most part changes in the operator tend to be backwards compatible at least to last supported major version, or dependent upon specific enablement.

If we don't want version dependent logic in the operator, we'll at least need instructions to update the images first, before upgrading the operator.

cc @vmuzikar @ahus1

Copy link
Contributor Author

@ryanemerson ryanemerson Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation for users who are specifying an image - won't there be an incompatibility with existing images with the new operator as this option won't yet exist?

I feel like we're in very murky water here, given that the Operator doesn't officially support multiple Keycloak versions but it seems we don't want to break things when using older server versions. It would be great if this was documented somewhere 🙂

If we don't want version dependent logic in the operator, we'll at least need instructions to update the images first, before upgrading the operator.

By update the images first, I assume you mean to a version with --cache-embedded-network-bind-address?

I don't think the user should ever be baking things related to the bind address into the image. Unless the user is providing a special JGroups String, such as "SITE_LOCAL", this value will typically be different for every container, hence the use of POD_IP.

We were previously relying on -Djgroups.bind.address being set as the first container arg, even when an unsupported podTemplateSpec was provided. AFAIU this means the only way a user could override this property would be to:

  1. Explicitly provide -Djgroups.bind.address as part of JAVA_OPTS_APPEND
  2. Configure -Djgroups.bind_addr as this takes precedence over -Djgroups.bind.address at the JGroups level

If we don't want version dependent logic in the operator, we'll at least need instructions to update the images first, before upgrading the operator.

An alternative is that we fallback to using -Djgroups.bind.address in the Operator and make the transition to the CLI args as part of KC 27.0.

The -Djgroups.bind.address property will still work with my changes. The user will get a warning if they also configure --cache-embedded-network-bind-address, but this should never be necessary with the Operator anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By update the images first, I assume you mean to a version with --cache-embedded-network-bind-address?

Yes, rebuilding against Keycloak 26.4 prior to upgrading the operator to 26.4.

Otherwise KC_CACHE_EMBEDDED_NETWORK_BIND_ADDRESS won't be recognized / used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is not officially supported to run different minor releases in KC and the Operator, I then recommend to add it to notable changes. As this was added for people running in IPv6 only environments, this restricts the audience even further.

Is there a planned docs on how to upgrade Keycloak with custom images and the Operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is not officially supported to run different minor releases in KC and the Operator

The only mention we have of this is a note on https://www.keycloak.org/operator/customizing-keycloak "To ensure full compatibility of Operator and Operand, make sure that the version of Keycloak release used in the custom image is aligned with the version of the operator." - which could even imply that you need to update your custom images before doing a micro update of the operator.

Is there a planned docs on how to upgrade Keycloak with custom images and the Operator?

The OLM instructions were updated to warn the user to prevent automatic upgrades. As far as I know we haven't elaborated, and don't have an issue captured, on all the steps necessary to actually upgrade the operator. Database backup and restore is especially hand-wavy. Some of this should be covered under #37356

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose what you suggest here will work with the change @ryanemerson is proposing here:

  • The old config to pass the POD_IP via the Java System Property is understood by the new Keycloak version - I hope @ryanemerson can confirm that.

AFAIK the KC Operator is installed per namespace, and I would imagine people would usually not run multiple Keycloaks in a namespace. Even if they do, given the coupling between the operator image and the Keycloak image, they should be all the same version.

I give my +1 to have the procedure @shawkins described above documented in a follow-up issue in the CND team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JGroups will always check the system properties and use its value if it is set; it has priority over every other configuration path. The changes to the operator code can be reverted without a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @shawkins suggestion. The only issue I can see is if the server dropped some functionality that the Operator relied upon, e.g. when upgrading from 26.x to 27.x, however I think that's very much an edge case that we can take measures to prevent.

  • The old config to pass the POD_IP via the Java System Property is understood by the new Keycloak version - I hope @ryanemerson can confirm that.

Yes, that is the case. Just with the caveats I describe in #40690 (comment)

AFAIK the KC Operator is installed per namespace, and I would imagine people would usually not run multiple Keycloaks in a namespace. Even if they do, given the coupling between the operator image and the Keycloak image, they should be all the same version.

The problem with the Operator framework is that it relies on CRDs which are installed in the global namespace of K8s, so mixing and matching Operator versions doesn't work well in practice and is not recommended. These limitations were a big motivator for some of the changes in OLM V1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a commit reverting the Operator changes and I have created #40884 to track modifying the Operator in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to share my 2 cents here.

Officially, we do not support using older KC image with newer Operator (or vice versa). That said, we do not have a good documented process how to upgrade when using custom images. So we're trying to avoid doing breaking changes in the Operator when not needed. And the current breaking changes policy in the server helps us with that: we're not doing any in minor releases. That enables us to use newer server version with older operator. Hence the approach @shawkins makes sense to me to document.

That said, this only works for minor releases. I think we need to make it clear that for major upgrades (where we will not even consider supporting zero-downtime upgrade), the user must scale down their deployment before upgrading the Operator.

This might change in the future if we make the operator aware of the server version to support using different server versions with different operator versions. But for now, clear docs is IMHO the way to go.

@ahus1
Copy link
Contributor

ahus1 commented Jul 7, 2025

@vmuzikar - with the Operator changes removed, I think the open questions are now resolved. Can you please confirm that adding the CLI options is ok for the CND team?

--cache-embedded-network-bind-address <address>
                     IP address used by clustering transport. By default, SITE_LOCAL is used.
                       Available only when Infinispan clustered embedded is enabled.
--cache-embedded-network-bind-port <port>
                     The Port the clustering transport will bind to. By default, port 7800 is used.
                       Available only when Infinispan clustered embedded is enabled.
--cache-embedded-network-external-address <address>
                     IP address that other instances in the cluster should use to contact this
                       node. Set only if it is different to cache-embedded-network-bind-address,
                       for example when this instance is behind a firewall. Available only when
                       Infinispan clustered embedded is enabled.
--cache-embedded-network-external-port <port>
                     Port that other instances in the cluster should use to contact this node. Set
                       only if it is different to cache-embedded-network-bind-port, for example
                       when this instance is behind a firewall Available only when Infinispan
                       clustered embedded is enabled.

Copy link
Contributor

@ahus1 ahus1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, looks good to me.

Comment on lines 47 to 54
=== JGroups system properties replaced with CLI options

Until now it was necessary to configure JGroups network addresses and ports using the `+jgroups.bind.*+` and `+jgroups.external_*+`
system properties. In this release we have introduced the following CLI options to allow these addresses and ports to be
configured directly via {project_name}: `cache-embedded-network-bind-address`, `cache-embedded-network-bind-port`,
`cache-embedded-network-external-address`, `cache-embedded-network-external-port`. Configuring ports using the old
properties will still function as before, but we recommend to change to the CLI options as this may change in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to go to changes-26_4_0.adoc as 26.3 has been released by now. Please update the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rebased and squashed on the latest main.

@ahus1 ahus1 requested a review from vmuzikar July 7, 2025 06:58
vmuzikar
vmuzikar previously approved these changes Jul 7, 2025
Copy link
Contributor

@vmuzikar vmuzikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahus1 Yes, the options work for me. I haven't deeply reviewed other changes, though.

Closes keycloak#40481

Signed-off-by: Ryan Emerson <remerson@redhat.com>
Copy link
Contributor

@ahus1 ahus1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on previous reviews.

@ahus1 ahus1 merged commit eb7ce6a into keycloak:main Jul 7, 2025
80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide CLI Parameters for jgroups.* options
5 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载