RabbitMQ Clustering and High Availability

Overview

RabbitMQ can be configured for multi-node clustering (replicating messages from a master node to slave nodes).
Fail-over can be provided, by adding a round-robin load balancer in front of the rabbit cluster nodes.

Although clustering and HA are possible, they are very complex to setup.
Our experiments allowed us to verify that a single node, coupled with a solution like Pacemaker, is generally enough.

This documentation explains how to cluster RabbitMQ on two nodes, then add a HA-proxy load-balancer in front of them.
For detailed information about RabbitMQ clustering, see RabbitMQ’s documentation:

Notice there also exist highly available queues, which are an alternative to clustering.
This is has not been documented here, but you can refer to the official documentation of RabbitMQ.

You may also find this blog entry useful.

Node rabbit1

Install RabbitMQ:

rabbit1$ apt-get install rabbitmq

Start RabbitMQ in cluster mode:

rabbit1$ rabbitmq-server -detached

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}]
...done.

Create a roboconf user (on node #1 only, will be available for the whole cluster):

rabbit1$ rabbitmqctl add_user roboconf roboconf
rabbit1$ rabbitmqctl set_permissions roboconf ".*" ".*" ".*"

Make sure HA sync mode on cluster is automatic (optional):

rabbit1$ rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'

Node rabbit2

First:

Install RabbitMQ:

rabbit2$ apt-get install rabbitmq

Start RabbitMQ and join the cluster:

rabbit2$ rabbitmq-server -detached

rabbit2$ rabbitmqctl stop_app
Stopping node rabbit@rabbit2 ...done.

rabbit2$ rabbitmqctl join_cluster rabbit@rabbit1
Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ...done.

rabbit2$ rabbitmqctl start_app
Starting node rabbit@rabbit2 ...done.

Node haproxy

Install HA-proxy:

haproxy$ apt-get install haproxy

Edit /etc/haproxy/haproxy.cfg and add the following content:

listen rabbitcluster 0.0.0.0:5672
         mode tcp
         option tcplog
         timeout client  3h
         timeout server  3h
         server rabbit-01 <IP-of-node-rabbit1>:5672 check fall 3 rise 2
         server rabbit-02 <IP-of-node-rabbit2>:5672 check fall 3 rise 2

Note that client and server timeouts must be set to a long period.
Otherwise idle connections may get closed by HAProxy, causing errors.

Start haproxy:

haproxy$ sudo /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid

Testing

Start Roboconf’s DM with the messaging server’s location set to the HA-proxy’s one.
Upload an application with dependencies (e.g. Apache / Tomcat sample : test messaging with Apache started, then alternately start/stop Tomcat and check Apache status changes).

Then:

rabbitmqctl stop
rabbitmq-server -detached