Co-authored with Dan Buchko.
Our team was recently involved in a project where potentially thousands of mobile phones were required to have their audio and video synchronized. The cornerstone of our solution was the utilization of the AMQP protocol, using RabbitMQ as the broker of choice.
Why AMQP and RabbitMQ?
All the synchronized phones were to be connected to a single WiFi network (i.e. on the same VLAN). To minimize any latency or connection issues, the controlling synchronization servers were also located on the same VLAN.
The objective was to synchronize the phones to display specific colors on their screens and simultaneously play pre-loaded audio tracks at precise times. The audio tracks were to be preinstalled on the phones beforehand, through an application.
This led to a requirement to push commands from the server to all phones concurrently. The application on each phone would receive, interpret, and execute each of these commands, such that when a message was pushed from the server, all the phones would receive it within a very short period of time (tens to hundreds of milliseconds). This communication had to happen over the WiFi network with thousands of concurrently connected clients.
The combination of having thousands of clients with low latency delivery of messages made the AMQP protocol and RabbitMQ broker an ideal solution for the task. By utilizing the AMQP publish/subscribe features, and the easy to use and configurable RabbitMQ node clustering, we were able to achieve reliable delivery of messages to all clients in a timely manner.
How did we do it?
We set-up a cluster of 4 RabbitMQ nodes, where each node was running on a separate virtual machine (VM) hosted on a VBlock. (VBlock is a server rack solution from EMC.) This VBlock was co-located on-premise and on the same VLAN as the WiFi network. Another VM hosted a TCP load balancer (HAProxy in our case) to direct traffic to the RabbitMQ cluster nodes. All connections from the phone clients to the RabbitMQ cluster were through the load balancer.
The audience in our test (each one with a phone) was divided into 3 groups, where each group of phones was tasked with performing a specific visual and auditory effect. In the RabbitMQ cluster, we created 3 fanout exchanges to correspond with each phone group. Thus if we needed to target one of the groups to perform a specific action, we simply published a predefined message to the corresponding exchange. Each message (we used JSON format) contained a specific instruction to play a sound or to change the color of the smartphone screen. But we also had to take into consideration the probability that a packet could be dropped when transmitted wirelessly in such a congested environment. As a result, more steps were required to improve the reliability of the system.
While we could publish messages to the phones over WiFi and get a high delivery percentage with relatively low latencies, this was not sufficient. Any message that was delivered with a huge latency or not delivered at all, would make the sound synchronization effect between phones a very unpleasant experience. The human ear is capable of detecting even the slightest difference in sounds. To address this problem, we split it into 2 parts:
- How to solve latency, and
- How to increase the delivery percentage of messages over the WiFi network.
To solve the latency problem, we needed to take into consideration the fact that we have little control on how long a message takes to be delivered to the phones, especially on a congested WiFi network. Some of the steps taken included minimizing the message size; using a lightweight protocol such as AMQP for delivery; and finally using a robust broker to handle the delivery of thousands of messages per second without any hiccups. RabbitMQ fit the bill nicely. But at the end of the day, we could not control the latency of every single message, resulting in each message being received by the phones at slightly different times. This meant the phones did not play the audio at precisely the same time.
Our solution for this issue was to have a common timestamp for every message sent out by the server. This timestamp indicated the exact time the instruction should be executed by the phone. All phones subscribed to that message would receive it and schedule the task to execute at the specified timestamp. We then made the execution timestamp for each message a few seconds into the future. This provided enough time for each phone to receive the message before having to execute the task. That meant that even if the phones received the message at different times, they could still schedule the task to be executed at the same time on all phones
But this now presented another problem: the phone clocks themselves were not synchronized, and we did not have control over their clocks. That meant if an instruction was supposed to execute at precisely 2PM, 2PM might be a slightly different time on each phone. The solution to this problem was to create a custom clock embedded in the custom phone application. The clock, using Network Time Protocol (NTP), would synchronize with a cluster of NTP servers co-located on the same VBlock as the RabbitMQ servers. This ensured that all the phones would have a custom clock synchronized down to the millisecond. So if a message was sent with an execution timestamp of “x,” the application on each phone would have the same time “x” as all the others.
Choosing the AMQP/RabbitMQ combination has been successful with our project. By utilizing AMQP’s queuing, routing, and lightweight features in combination with RabbitMQ’s ease of use, it was possible to publish thousands of messages to wireless clients while maintaining a very high delivery percentage with low latencies. One further option that we are currently investigating is the MQTT protocol, which was designed specifically for devices where network bandwidth is at a premium. We will be looking to use the RabbitMQ MQTT plugin included with the server.