Troubleshoot RabbitMQ Cluster Failure
Problem
The RabbitMQ cluster fails as a result of a power outage.
Solution
If
the
RabbitMQ
cluster
fails as a result of a power outage, the RabbitMQ message bus may not restart properly. To
identify
the status of the RabbitMQ message bus, run the kubectl get po -n
northstar -l app=rabbitmq
command. This command should show three pods with their
status as Running
. For
example:
$ kubectl get po -n northstar -l app=rabbitmq NAME READY STATUS RESTARTS AGE rabbitmq-0 1/1 Running 0 10m rabbitmq-1 1/1 Running 0 10m rabbitmq-2 1/1 Running 0 9m37s
If the status of one or more pods is Error
, use the following recovery
procedure:
-
Delete RabbitMQ.
kubectl delete po -n northstar -l app=rabbitmq
-
Check the status of the pods.
kubectl get po -n northstar -l app=rabbitmq
.Repeat
kubectl delete po -n northstar -l app=rabbitmq
until the status of all pods isRunning
. -
Restart the Paragon Pathfinder application.
kubectl rollout restart deploy -n northstar