+
Skip to content

Conversation

ahus1
Copy link
Contributor

@ahus1 ahus1 commented Jul 21, 2025

Closes #41290

@ahus1 ahus1 self-assigned this Jul 21, 2025
@ahus1 ahus1 force-pushed the is-41290-allow-concurrent-starts-postgres branch from 8784e45 to 0661802 Compare July 21, 2025 17:01
@ahus1 ahus1 marked this pull request as ready for review July 22, 2025 09:16
@ahus1 ahus1 requested review from a team as code owners July 22, 2025 09:17
@ahus1 ahus1 changed the title Ensure to find all members even on concurrent starts Reduce likelihood of multiple coordinators on concurrent startup Jul 24, 2025
while (pingData.stream().noneMatch(PingData::isCoord)) {
// Do a quick check if more nodes have arrived, to have a more complete list of nodes to start with.
List<PingData> newPingData = readFromDB(cluster_name);
if (newPingData.size() == pingData.size() || pingData.stream().anyMatch(PingData::isCoord)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to just check the size of the List or do we need to consider the content of the PingData as well. I'm just wondering if the view changed from [a,b] to [a,c]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in next commit, will check if they both contain the same addresses

* @author Bela Ban
* @since 5.4, 5.3.7
*/
public class JDBC_PING2_Test {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to format this file per Keycloak conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in next commit

return modify(new JChannel(cfg).name(name));
}

protected static class Connector implements Runnable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be a record:

    protected record Connector(CountDownLatch latch, JChannel ch) implements Runnable {
        @Override
        public void run() {
            try {
                latch.await();
                ch.connect(CLUSTER);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in next commit

.collect(Collectors.joining("\n")));
Arrays.stream(channels).filter(ch -> ch.view().getCoord() != ch.getAddress()).forEach(JChannel::close);
Arrays.stream(channels).filter(ch -> !ch.isClosed()).forEach(JChannel::close);
System.out.println("Closed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to remove the various System.out calls or replace them with debug logging before merging this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in next commit

ahus1 added 5 commits July 24, 2025 16:58
Closes keycloak#41290

Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
@ahus1 ahus1 force-pushed the is-41290-allow-concurrent-starts-postgres branch from 7a222cf to 65fa68c Compare July 24, 2025 15:05
@ahus1 ahus1 requested a review from ryanemerson July 24, 2025 15:07
@ahus1
Copy link
Contributor Author

ahus1 commented Jul 24, 2025

Thank you for the review. The PR now contains also a change to pre-seed the addresses so they are ordered, and the first node gets a one second grace period from the other nodes starting later to complete it becoming a coordinator.

var tableName = JpaUtils.getTableNameForNativeQuery("JGROUPS_PING", cp.getEntityManager());
String statement = String.format("INSERT INTO %s values (?, ?, ?, ?, ?)", tableName);

Address address = new UUID(0, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Infinispan configurations require an ExtendedUUID (and the upcoming ISPN 16 too)

Copy link
Contributor Author

@ahus1 ahus1 Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Alex: Also test with site name before finalizing this PR (use the SPI option) as that forces an Extended UUID on ISPN side of things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that JGroups can only serialize a UUID or a SiteUUID due to the code in Util.addressToString. For a plain ExtendedUUID it will return null, which leads to followup-errors, starting with a the value being rejected by the database.

I updated the code to use SiteUUID, and for that I needed to also use a site. Please have a look, maybe we could pair for any additional changes that might be necessary.

Copy link
Contributor

@ryanemerson ryanemerson Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the repurcussions of using SiteUUID vs ExtendedUUID here, but it seems to me that ExtendedUUID should be handled by Util.addressToString as I believe the current implementation will cause issues with Infinispan 16 users relying on JDBC_PING2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 to use SiteUUID.
@belaban can this belaban/JGroups#901 be backported to 5.3.x branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that when setting spi-cache-embedded--default--site-name this already tries to use a ExtendedUUID today, but it then breaks with the NULL problem when using JDBC_PING2.

I'm reverting my previous commit. It seems we'll need to state that spi-cache-embedded--default--site-name is incompatible with JDBC_PING2?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now mapping an ExtendedUUID to a UUID when passing it down to JDBC_PING as I don't know when the backport will land to and to see if it now breaks in a different place.

try (PreparedStatement s = con.prepareStatement(statement)) {
s.setString(1, org.jgroups.util.Util.addressToString(address)); // address
s.setString(2, "(starting)"); // name
s.setString(3, "ISPN"); // cluster name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster name can be changed; you need to read it from the configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed with this commit as well.

@pruivo
Copy link
Contributor

pruivo commented Jul 24, 2025

KEYCLOAK_JDBC_PING2 looks like a mix of discovery, group membership, and failure detection. Let's see what kind of issues pop up in production. 😰

String seq = storage.loadOrCreate(JGROUPS_ADDRESS_SEQUENCE, () -> "0");
long value = Long.parseLong(seq) + 1;
String newSeq = Long.toString(value);
storage.replace(JGROUPS_ADDRESS_SEQUENCE, seq, newSeq);
Copy link
Contributor

@ryanemerson ryanemerson Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to handle the case where the replace fails, otherwise we will potentially generate two addresses on different nodes with the same seq number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all handled in one transaction, and the DB transaction will be rolled back if it fails. So those addresses that pass through this one are guaranteed to be consecutive, ordered, and visible to all other nodes.

ahus1 added 3 commits July 29, 2025 10:11
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>

This reverts commit ccf38b2.
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
@ahus1 ahus1 force-pushed the is-41290-allow-concurrent-starts-postgres branch from 0cfcaf5 to c69f8e7 Compare July 29, 2025 12:04
ahus1 added 2 commits July 29, 2025 15:03
…okie

Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
@ahus1 ahus1 requested a review from pruivo July 29, 2025 15:58
@ahus1
Copy link
Contributor Author

ahus1 commented Jul 29, 2025

@pruivo - the build is green and the code blocks moved around one last time. Please review this one when you have the time. Thanks!

Copy link
Contributor

@pruivo pruivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got JdbcSQLIntegrityConstraintViolationException locally, running testConcurrentStartupMultipleTimes test.

19:16:52,381 ERROR jgroups-6,jdbc-test,4 [org.keycloak.jgroups.protocol.KEYCLOAK_JDBC_PING2_FOR_TESTING] 4: failed writing to DB: org.h2.jdbc.JdbcSQLIntegrityConstraintViolationException: Unique index or primary key violation: "PUBLIC.PRIMARY_KEY_C ON PUBLIC.JGROUPS(ADDRESS) VALUES ( /* 751 */ 'uuid://11e06b89-e3c2-4363-9794-ffd49d283280' )"; SQL statement:

}
CountDownLatch latch=new CountDownLatch(1);
int index=1;
List<Thread> threads=new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for the threads to finish?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in next commit

Comment on lines 94 to 95
Arrays.stream(channels).filter(ch -> ch.view().getCoord() != ch.getAddress()).forEach(JChannel::close);
Arrays.stream(channels).filter(ch -> !ch.isClosed()).forEach(JChannel::close);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably safer to put this in a finally block. If the test fails, the channels won't be closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of next commit


protected static JChannel modify(JChannel ch) {
GMS gms=ch.stack().findProtocol(GMS.class);
gms.setJoinTimeout(3000).setMaxJoinAttempts(5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not configure this in the XML?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inherited it :D ... moved to XML in next commit

write_data_on_find="true"
/>
<!-- very aggressive merging to speed up the test -->
<MERGE3 min_interval="150"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making the tests pass because it runs the partition handling check every 200ms. In production, it runs every 30s.

Using the production values, the test fails randomly

[ERROR]   JdbcPing2Test.testConcurrentStartupMultipleTimes:60->runSingleTest:89 » Timeout Timeout 20000 kicked in, views are:
1: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
2: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
3: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
4: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
5: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
6: [6|0] (1) [6]
7: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
8: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The faster merge interval is there to speed up the test. In production with a longer interval, it will take longer, but it will eventually converge. IMHO any sub-second interval will be too low for production. Still we can go for 5 seconds or so?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In production with a longer interval, it will take longer, but it will eventually converge

So you agree we don't need this PR 😄

IMHO any sub-second interval will be too low for production.

The LinkedIn blog recommended increasing it to minutes...

On a serious note, I will change my opinion regarding last night and state that the test should be removed. With MERGE3, either you use JDBC_PING2 or KEYCLOAK_JDBC_PING2, the test passes. Without it, it fails with either protocol. I can conclude the test is not testing anything useful. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you agree we don't need this PR ... the test should be removed

The test is more a stress testing harness that helped me to test KEYCLOAK_JDBC_PING2 against H2, and discover possible problems / areas to improve the converging of the cluster. With the other change of having a pre-seeded UUID which is not part of the added test, this should be necessary less often, but still when clusters split.

So I would then remove the test, and keep the (ignored) stress test that's running 100 times?

s.setString(1, org.jgroups.util.Util.addressToString(new UUID(address.getMostSignificantBits(), address.getLeastSignificantBits()))); // address
s.setString(2, "(starting)"); // name
s.setString(3, clusterName); // cluster name
s.setString(4, new IpAddress("localhost", 0).toString()); // ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new IpAddress("localhost", 0).toString() should have always the same output. It can be cached.

ahus1 and others added 4 commits July 30, 2025 08:17
…dbcPing2Test.java

Co-authored-by: Pedro Ruivo <pruivo@users.noreply.github.com>
Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net>
…dbcPing2Test.java

Co-authored-by: Pedro Ruivo <pruivo@users.noreply.github.com>
Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net>
…dbcPing2Test.java

Co-authored-by: Pedro Ruivo <pruivo@users.noreply.github.com>
Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
@ahus1 ahus1 requested a review from pruivo July 30, 2025 06:44
@ahus1
Copy link
Contributor Author

ahus1 commented Jul 30, 2025

@pruivo - updated the PR.
The exception JdbcSQLIntegrityConstraintViolationException is something that can happen when there are attempts of concurrent writes (for example the coordinator writes a node, and the node itself writes their information). That should then heal itself on the next attempt.

Ready for another round of reviews.

Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Copy link
Contributor

@pruivo pruivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

}

@Override
protected void delete(String clustername, Address addressToDelete) throws SQLException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to overwrite callInsertStoredProcedure as this calls Util#addressToString and is called from writeToDB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use it, so no

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I take your point as the toUUID should not be necessary in JGroups once belaban/JGroups#901 has been backported. There is a lot of flags we don't use in the Keycloak use-case that we're still including here. I assume the reason for inclusion is so that our solution here is as close to the PR that will eventually be issued to JGroups for JDBC_PING2.

@ahus1 ahus1 merged commit c9943af into keycloak:main Jul 30, 2025
127 of 130 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concurrent starts with JDBC_PING lead to a split cluster

3 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载