We'll respond shortly.
The “BBC of Norway” is called NRK—it is Norway’s largest media organization and 90% of Norway’s 5 million residents consume one of NRK’s media services every day. Often referred to as the Norwegian Broadcasting Corporation, NRK is a government-owned radio and television broadcasting company headquartered in Oslo with 3500 employees. Their websites, radio.nrk.no and tv.nrk.no, recently began offering both radio and television services, and RabbitMQ helped to connect the “new” system with the “old.” Recently, our Pivotal POV Blog team dialogued and collaborated with Erlend Wiig, Software Engineer at NRK, about their use of RabbitMQ as Mr. Wiig penned the following article.
Media consumption is growing online everywhere, including Norway. Three to four years ago, we began a project that would provide a better way for people to consume radio and TV on the web. To do so, we needed to push data from legacy radio and TV systems to our new website in an ongoing manner. We had four separate, legacy media systems, and these stored information on over one million programs. One system stored TV titles, descriptions, times, dates, and airing rights. Another provided TV index points, like the time an interview starts in a show, as well as host, place, contributors, and more. A third system sends TV subtitles, and a fourth provides similar information for radio shows. Thousands of content updates take place each day to potentially all of the 30,000 TV and 82,000 radio programs on the website. Of course, keeping this amount of information in sync manually was not possible.
To connect a group of older systems and a new database serving the web, messaging was clearly the answer. Open source middleware provided a great solution from a budget perspective, but we also needed a tool we could trust and find support when needed.
We began to prioritize our needs. There wasn’t a high-volume processing requirement or major throughput metrics. Our needs were more about architecture. We knew a loosely coupled system with asynchronous processing would provide a way to adapt in the future, and we didn’t have to reinvent the wheel when open source alternatives were available. As well, transmitted information needed a persistent store because source systems didn’t keep track of the history of changes. We also needed a means to track failures and reprocess them. Of course, the solution couldn’t create complexity, be difficult to use, or add risk—a proven solution was needed.
RabbitMQ was one of several options, including MSMQ and Windows Azure Service Bus. We had heard a number of great things about RabbitMQ from other developers, and RabbitMQ offered a solid open source community as well as a way to get commercial support. It had a proven track record and was built on a robust, distribution-centric language, Erlang. These reasons drove our technology team’s decision. Adding messaging to the architecture was, perhaps, beyond the concept of a minimum viable product, but we knew the approach would be easier to develop, deploy, manage, and operate than a tightly coupled set of code between new and old systems. After messaging was in the architecture, we knew RabbitMQ was trusted.
In our architecture, source systems trigger messages when content is created or updated. Each system calls a specific service endpoint within our C# .NET-based message system, ODA. The endpoints pass a standard XML format called “Gluon” via an older set of messaging services based on SOAP or newer services based on REST. Each service has a simple job—receive the message, acknowledge the reception back to the sender, test that the content is valid XML, and write the received message to a RabbitMQ queue. RabbitMQ transports the message, and adapters read the messages off the queues, process them into standard business objects, and persist the information to an Oracle database (Fig. 2). The database is then used to serve content to website consumers.
By having a loosely coupled, asynchronous architecture, we gain additional benefits. Importantly, the system stores messages in a queue if downstream consumers are offline or the database is taken down for an upgrade—no message is lost. In addition, we persist each message to document storage as if it were an event. Not only can we replay an entire history of events to rebuild the target database, we can use the message data to generate other consumption formats in batch. This recently happened when we needed to produce media subtitles for iOS. We wrote new code in our adapter (queue consumer) and replayed the messages to the queue, pumping out the newly formatted iOS subtitles.
We are fans of distributed architectures, and loosely coupling components provides for significant gains when systems evolve and change. This is why placing a message queue like RabbitMQ within the heart of our architecture made sense.
The use of a queue also helps us operationalize messages and related issues. Since we require downstream systems to have the most recent information, like the latest correction to a subtitle, messages that fail in the consumer are put back in the queue, and other messages will not be processed until the prior issue is fixed. We maintain the message order or sequence, including failed messages, with one producer and one consumer in each service. Our approach keeps messages in order and also causes backups, providing a way to take action and resolve issues. To gain visibility to these backups, we “hand rolled” our own dashboard (Fig. 3) based on the RabbitMQ Management plugin API to see where messages are backing up.
From an operations perspective, we automatically retry failed messages after a wait period, and, with failed messages, an intact copy of a failing message waits in any given queue. Since the message is sitting in the queue, we don’t have to look at logs to resolve issues. Instead, we can view the message within the queue to troubleshoot the problem, write new tests, or correct the problem. RabbitMQ let’s us manage message failures in a much easier way than before. We can see problems more easily and respond more quickly.
To ensure a highly reliable service, we set RabbitMQ up with two clustered disc nodes and mirrored queues (Fig. 4). The load balancer sits in front of the nodes and provides failover if one service node fails. We don’t want to lose any messages because it is not straightforward to reproduce them from production systems. The RabbitMQ website provides excellent guides on failover setup. They are easy to follow and make this type of failover configuration pretty straightforward.
With RabbitMQ, we achieve a very cost-effective, loosely coupled, flexible architecture and an operational mechanism for managing message failures. The message queue saves us a lot of time on bug investigation and maintenance. It’s simple to set up and provides powerful concepts, many which we haven’t used yet, but we feel can solve future needs. RabbitMQ brings a robust solution we’re comfortable going forward with.
Learn more about RabbitMQ:
|About the Author: Erlend Wiig graduated with a bachelor degree from the Norwegian School of Information Technology (NITH) and has a Masters in Distributed Systems from Brunel University. He knew from the age of 7 that he wanted to work with computers, and, after spending his early professional life on the operational side of the industry, he spent the last 4 years writing software. Most recently, his work includes backend metadata services at NRK in Oslo, Norway. Outside of work, he spends his time with his family and interests such as cycling, hiking, concerts and politics.|