-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Reduce likelihood of multiple coordinators on concurrent startup #41291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce likelihood of multiple coordinators on concurrent startup #41291
Conversation
8784e45
to
0661802
Compare
while (pingData.stream().noneMatch(PingData::isCoord)) { | ||
// Do a quick check if more nodes have arrived, to have a more complete list of nodes to start with. | ||
List<PingData> newPingData = readFromDB(cluster_name); | ||
if (newPingData.size() == pingData.size() || pingData.stream().anyMatch(PingData::isCoord)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to just check the size of the List or do we need to consider the content of the PingData as well. I'm just wondering if the view changed from [a,b]
to [a,c]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in next commit, will check if they both contain the same addresses
* @author Bela Ban | ||
* @since 5.4, 5.3.7 | ||
*/ | ||
public class JDBC_PING2_Test { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to format this file per Keycloak conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in next commit
return modify(new JChannel(cfg).name(name)); | ||
} | ||
|
||
protected static class Connector implements Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be a record:
protected record Connector(CountDownLatch latch, JChannel ch) implements Runnable {
@Override
public void run() {
try {
latch.await();
ch.connect(CLUSTER);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in next commit
.collect(Collectors.joining("\n"))); | ||
Arrays.stream(channels).filter(ch -> ch.view().getCoord() != ch.getAddress()).forEach(JChannel::close); | ||
Arrays.stream(channels).filter(ch -> !ch.isClosed()).forEach(JChannel::close); | ||
System.out.println("Closed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to remove the various System.out
calls or replace them with debug logging before merging this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in next commit
model/infinispan/src/main/java/org/keycloak/jgroups/protocol/KEYCLOAK_JDBC_PING2.java
Show resolved
Hide resolved
Closes keycloak#41290 Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
7a222cf
to
65fa68c
Compare
Thank you for the review. The PR now contains also a change to pre-seed the addresses so they are ordered, and the first node gets a one second grace period from the other nodes starting later to complete it becoming a coordinator. |
var tableName = JpaUtils.getTableNameForNativeQuery("JGROUPS_PING", cp.getEntityManager()); | ||
String statement = String.format("INSERT INTO %s values (?, ?, ?, ?, ?)", tableName); | ||
|
||
Address address = new UUID(0, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some Infinispan configurations require an ExtendedUUID
(and the upcoming ISPN 16 too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO for Alex: Also test with site name before finalizing this PR (use the SPI option) as that forces an Extended UUID on ISPN side of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that JGroups can only serialize a UUID or a SiteUUID due to the code in Util.addressToString. For a plain ExtendedUUID it will return null, which leads to followup-errors, starting with a the value being rejected by the database.
I updated the code to use SiteUUID, and for that I needed to also use a site. Please have a look, maybe we could pair for any additional changes that might be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the repurcussions of using SiteUUID
vs ExtendedUUID
here, but it seems to me that ExtendedUUID
should be handled by Util.addressToString
as I believe the current implementation will cause issues with Infinispan 16 users relying on JDBC_PING2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 to use SiteUUID.
@belaban can this belaban/JGroups#901 be backported to 5.3.x branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that when setting spi-cache-embedded--default--site-name
this already tries to use a ExtendedUUID today, but it then breaks with the NULL problem when using JDBC_PING2.
I'm reverting my previous commit. It seems we'll need to state that spi-cache-embedded--default--site-name
is incompatible with JDBC_PING2?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now mapping an ExtendedUUID to a UUID when passing it down to JDBC_PING as I don't know when the backport will land to and to see if it now breaks in a different place.
try (PreparedStatement s = con.prepareStatement(statement)) { | ||
s.setString(1, org.jgroups.util.Util.addressToString(address)); // address | ||
s.setString(2, "(starting)"); // name | ||
s.setString(3, "ISPN"); // cluster name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster name can be changed; you need to read it from the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed with this commit as well.
|
String seq = storage.loadOrCreate(JGROUPS_ADDRESS_SEQUENCE, () -> "0"); | ||
long value = Long.parseLong(seq) + 1; | ||
String newSeq = Long.toString(value); | ||
storage.replace(JGROUPS_ADDRESS_SEQUENCE, seq, newSeq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to handle the case where the replace fails, otherwise we will potentially generate two addresses on different nodes with the same seq number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all handled in one transaction, and the DB transaction will be rolled back if it fails. So those addresses that pass through this one are guaranteed to be consecutive, ordered, and visible to all other nodes.
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com> This reverts commit ccf38b2.
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
0cfcaf5
to
c69f8e7
Compare
…okie Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
@pruivo - the build is green and the code blocks moved around one last time. Please review this one when you have the time. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got JdbcSQLIntegrityConstraintViolationException
locally, running testConcurrentStartupMultipleTimes
test.
19:16:52,381 ERROR jgroups-6,jdbc-test,4 [org.keycloak.jgroups.protocol.KEYCLOAK_JDBC_PING2_FOR_TESTING] 4: failed writing to DB: org.h2.jdbc.JdbcSQLIntegrityConstraintViolationException: Unique index or primary key violation: "PUBLIC.PRIMARY_KEY_C ON PUBLIC.JGROUPS(ADDRESS) VALUES ( /* 751 */ 'uuid://11e06b89-e3c2-4363-9794-ffd49d283280' )"; SQL statement:
model/infinispan/src/test/java/org/keycloak/jgroups/protocol/JdbcPing2Test.java
Outdated
Show resolved
Hide resolved
model/infinispan/src/test/java/org/keycloak/jgroups/protocol/JdbcPing2Test.java
Outdated
Show resolved
Hide resolved
model/infinispan/src/test/java/org/keycloak/jgroups/protocol/JdbcPing2Test.java
Outdated
Show resolved
Hide resolved
} | ||
CountDownLatch latch=new CountDownLatch(1); | ||
int index=1; | ||
List<Thread> threads=new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait for the threads to finish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in next commit
Arrays.stream(channels).filter(ch -> ch.view().getCoord() != ch.getAddress()).forEach(JChannel::close); | ||
Arrays.stream(channels).filter(ch -> !ch.isClosed()).forEach(JChannel::close); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably safer to put this in a finally block. If the test fails, the channels won't be closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of next commit
|
||
protected static JChannel modify(JChannel ch) { | ||
GMS gms=ch.stack().findProtocol(GMS.class); | ||
gms.setJoinTimeout(3000).setMaxJoinAttempts(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not configure this in the XML?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've inherited it :D ... moved to XML in next commit
write_data_on_find="true" | ||
/> | ||
<!-- very aggressive merging to speed up the test --> | ||
<MERGE3 min_interval="150" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is making the tests pass because it runs the partition handling check every 200ms. In production, it runs every 30s.
Using the production values, the test fails randomly
[ERROR] JdbcPing2Test.testConcurrentStartupMultipleTimes:60->runSingleTest:89 » Timeout Timeout 20000 kicked in, views are:
1: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
2: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
3: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
4: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
5: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
6: [6|0] (1) [6]
7: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
8: [8|4] (7) [8, 1, 4, 7, 2, 5, 3]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The faster merge interval is there to speed up the test. In production with a longer interval, it will take longer, but it will eventually converge. IMHO any sub-second interval will be too low for production. Still we can go for 5 seconds or so?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In production with a longer interval, it will take longer, but it will eventually converge
So you agree we don't need this PR 😄
IMHO any sub-second interval will be too low for production.
The LinkedIn blog recommended increasing it to minutes...
On a serious note, I will change my opinion regarding last night and state that the test should be removed. With MERGE3
, either you use JDBC_PING2
or KEYCLOAK_JDBC_PING2
, the test passes. Without it, it fails with either protocol. I can conclude the test is not testing anything useful. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you agree we don't need this PR ... the test should be removed
The test is more a stress testing harness that helped me to test KEYCLOAK_JDBC_PING2 against H2, and discover possible problems / areas to improve the converging of the cluster. With the other change of having a pre-seeded UUID which is not part of the added test, this should be necessary less often, but still when clusters split.
So I would then remove the test, and keep the (ignored) stress test that's running 100 times?
s.setString(1, org.jgroups.util.Util.addressToString(new UUID(address.getMostSignificantBits(), address.getLeastSignificantBits()))); // address | ||
s.setString(2, "(starting)"); // name | ||
s.setString(3, clusterName); // cluster name | ||
s.setString(4, new IpAddress("localhost", 0).toString()); // ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new IpAddress("localhost", 0).toString()
should have always the same output. It can be cached.
…dbcPing2Test.java Co-authored-by: Pedro Ruivo <pruivo@users.noreply.github.com> Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net>
…dbcPing2Test.java Co-authored-by: Pedro Ruivo <pruivo@users.noreply.github.com> Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net>
…dbcPing2Test.java Co-authored-by: Pedro Ruivo <pruivo@users.noreply.github.com> Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
@pruivo - updated the PR. Ready for another round of reviews. |
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
} | ||
|
||
@Override | ||
protected void delete(String clustername, Address addressToDelete) throws SQLException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to overwrite callInsertStoredProcedure
as this calls Util#addressToString
and is called from writeToDB
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use it, so no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I take your point as the toUUID
should not be necessary in JGroups once belaban/JGroups#901 has been backported. There is a lot of flags we don't use in the Keycloak use-case that we're still including here. I assume the reason for inclusion is so that our solution here is as close to the PR that will eventually be issued to JGroups for JDBC_PING2.
Closes #41290