-
Notifications
You must be signed in to change notification settings - Fork 292
Description
We are using nats-streaming version 0.24.4 running as three pods (Kubernetes). When nats-streaming was deployed the pods rolled in an order that does not take the nats streaming leader into account. We have 96 channels. During startup received 10 of
[1] 2022/10/03 19:20:39.135630 [ERR] STREAM: channel "system-events.user-identity" - unable to restore messages (snapshot 75859347/75979326, store 75872433/75962095, cfs 75859347): nats: timeout
every three seconds then that nats-streaming pod would abort/exit. Kubernetes would start a new instance and again same issue.
Our message store is in a ram disk so we eventually shut down all pods and restarted from scratch (loosing all messages). This recovered nat-streaming. nats pods were not rolled during the nats-streaming deployment.
In terms of order system-events.user-identity is not the first nor the last channel based on nats-streaming channel creation logs order.
What would cause this problem?