-
Notifications
You must be signed in to change notification settings - Fork 292
Open
Description
We have a 3-node nats-streaming cluster setup with a FILE backend with a separate nats pod. The entire setup is TLS backed.
- args:
- --cluster_id=nats-streaming
- -m=8222
- -store=FILE
- --dir=/nats-datastore
- --max_age=86400s
- -sc=/nats/nats-streaming.conf
We have a failure scenario where one of the cluster node is failing to restore messages from the snapshot. I have attached logs from all 3 pods. This has happened only once in many setups we have tried till date in staging environment.
We do not know how the cluster got into this situation. Next steps I am looking for is in this order,
- Reasoning on how cluster could have ended up in this situation and how do we interpret logs (attached)?
- How to successfully restore from here?
Please advice. I can provide you with much information you would need on the environment to help investigate this.
Metadata
Metadata
Assignees
Labels
No labels