Keycloak 26.3.3 - Infinispan cluster issues

### Before reporting an issue

- [x] I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

### Area

infinispan

### Describe the bug

We are running Keycloak in Docker containers on Linux VMs in cluster mode.
During our switch from TCP_PING to JDBC_PING, we started experiencing intermittent issues during deployments.

After some troubleshooting, we discovered that Infinispan occasionally encounters problems when forming a cluster. These issues typically occur when one or more cluster members are redeployed. The symptoms are not consistent &mdash; they vary from one deployment to another.

Example scenario:
We have four Keycloak instances expected to form a cluster: keycloak-1, keycloak-2, keycloak-3, and keycloak-4.
At the time of restart, keycloak-1 is the coordinator.

If we restart keycloak-2, we observe the following log on keycloak-1:
```
2025-10-10 14:02:21.584 stdout 2025-10-10 14:02:21,583 TRACE [org.jgroups.blocks.cs.NioServer] (NioServer.Selector [/0.0.0.0:57800]-3,keycloak-1-56712) keycloak-1-ip-address:57800: removed connection to keycloak-2-ip-address:57800
```
On keycloak-2 (the restarted instance), we can see that it becomes a singleton node:
```
2025-10-10 14:02:53.879 stdout 2025-10-10 14:02:53,878 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-1-899: too many JOIN attempts (10): becoming singleton
2025-10-10 14:02:53.880 stdout 2025-10-10 14:02:53,880 DEBUG [org.jgroups.protocols.pbcast.NAKACK2] (main) 
2025-10-10 14:02:53.880 stdout [keycloak-1-899 setDigest()]
2025-10-10 14:02:53.880 stdout existing digest:  []
2025-10-10 14:02:53.880 stdout new digest:       keycloak-1-899: [0 (0)]
2025-10-10 14:02:53.880 stdout resulting digest: keycloak-1-899: [0 (0)]
```
For some unknown reason, keycloak-3 now becomes the new coordinator:
```
2025-10-10 14:02:53.704 stdout 2025-10-10 14:02:53,704 DEBUG [org.jgroups.protocols.pbcast.GMS] (VERIFY_SUSPECT2.Runner-1) keycloak-3-26562: members are (4) keycloak-1-56712,keycloak-3,keycloak-2-43087,keycloak-1-29717, coord=keycloak-3-26562: I'm the new coordinator
```
A few seconds later, Infinispan fails to recover the cluster state on keycloak-3:
```
2025-10-10 14:02:59.782 stdout 2025-10-10 14:02:59,782 WARN  [org.infinispan.topology.ClusterTopologyManagerImpl] (timeout-thread--p4-t1) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge), will retry: org.infinispan.commons.TimeoutException: ISPN000476: Timed out waiting for responses for request 96 from keycloak-1-29717 after 6 seconds
```

These are the last logs on the initial coordinator (keycloak-1):
```
2025-10-10 14:02:11.755 stdout 2025-10-10 14:02:11,755 INFO  [org.infinispan.CLUSTER] () [Context=offlineClientSessions] ISPN100002: Starting rebalance with members [keycloak-1-56712, keycloak-3-26562, keycloak-4-43087], phase READ_OLD_WRITE_ALL, topology id 15
2025-10-10 14:02:11.757 stdout 2025-10-10 14:02:11,757 TRACE [org.jgroups.protocols.pbcast.NAKACK2] () keycloak-1-56712 --> [all]: #177
2025-10-10 14:02:11.757 stdout 2025-10-10 14:02:11,757 TRACE [org.jgroups.protocols.TCP] () keycloak-1-56712: sending msg to null, src=keycloak-1-56712, size=4522, headers are NAKACK2: [MSG, seqno=177], TP: [cluster=ISPN]
2025-10-10 14:02:11.757 stdout 2025-10-10 14:02:11,757 TRACE [org.jgroups.protocols.MFC] () keycloak-1-56712 used 4474 credits, 3391245 remaining
2025-10-10 14:02:21.584 stdout 2025-10-10 14:02:21,583 TRACE [org.jgroups.blocks.cs.NioServer] (NioServer.Selector [/0.0.0.0:57800]-3,keycloak-1-56712) 10.41.1.252:57800: removed connection to keycloak-2-ip-address:57800
2025-10-10 14:02:21.590 stdout 2025-10-10 14:02:21,589 TRACE [org.jgroups.protocols.FD_SOCK2] (NioServer.Selector [/0.0.0.0:57800]-3,keycloak-1-56712) keycloak-1-56712: CONNECT <-- keycloak-4-43087
2025-10-10 14:02:21.590 stdout 2025-10-10 14:02:21,590 TRACE [org.jgroups.protocols.FD_SOCK2] (NioServer.Selector [/0.0.0.0:57800]-3,keycloak-1-56712) keycloak-1-56712: CONNECT-RSP[cluster=ISPN, srv=keycloak-1-56712] --> keycloak-4-43087
```

End result:
Both keycloak-1 and keycloak-2 become unresponsive.
The JGROUPS_PING table shows an invalid state &mdash; for example, keycloak-1 remains listed even though it&rsquo;s unresponsive.
The state never changes, and the cluster remains stuck in this condition.

<img width="1248" height="215" alt="Image" src="https://github.com/user-attachments/assets/db0c653c-1a2b-4b5c-a57a-d8e72ce69a12" />

### Version

26.3.3

### Regression

- [ ] The issue is a regression

### Expected behavior

Cluster to recover or not end in this state during cluster member redeploy

### Actual behavior

already described

### How to Reproduce?

already described

### Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keycloak 26.3.3 - Infinispan cluster issues #43367

Before reporting an issue

Area

Describe the bug

Version

Regression

Expected behavior

Actual behavior

How to Reproduce?

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Keycloak 26.3.3 - Infinispan cluster issues #43367

Description

Before reporting an issue

Area

Describe the bug

Version

Regression

Expected behavior

Actual behavior

How to Reproduce?

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions