We'll respond shortly.
As the big data and cloud eras merge, we continue to build more apps, create more data, and perform more analytics. As well, businesses groups and software teams are using the cloud more and facing more scenarios with geographically separated components, micro-services, applications, and data stores.
This poses a new challenge—the cloud often breaks the link between applications and their location.
As a familiar example, companies run global applications in different countries and spend a lot of energy shipping data to a separate, centralized warehouse. Our cloud future looks to increase this type of geographic separation because we are using multiple cloud vendors, adding data centers, developing hybrid clouds, scaling components in the cloud, using software as a service APIs, and building big data analytics with applications on top. In each of these cases, systems are more separated, often across data centers and WANs.
In software parlance, this problem is solved with a cloud bus or messaging broker like Pivotal RabbitMQ.
Recently, RabbitMQ Developer Advocate Alvaro Videla gave a talk at Code Mesh 2013 on this topic. While some may call it the data plane, integration-as-a-service (IaaS), cloud queue service, or message oriented middleware, Alvaro explains how RabbitMQ can be used to build a distributed data ingestion system based on federation.
Alvaro’s scenario is based on similar customer inquiries from the RabbitMQ mailing list—four applications are running in various cloud regions on AWS for performance reasons, (Eastern U.S., Ireland, Singapore, and Brazil) and need to send data to the U.S. region to be analyzed in some version of a data warehouse or centralized data lake. This example would also allow you to place data from any application onto Pivotal One for real-time or big data analytics.
Approaching the Architecture with Federation and Sharding
One way of dealing with this problem is to use local application logic to send data over the wire to a remote RabbitMQ server, but that would involve a lot of coding and introduce new distributed systems problems. For example, there would need to be logic and processing for when the server is offline or the network is down. Similarly, one might think a RabbitMQ cluster would work, but they are sensitive to network problems and built for use on one network partition versus across WANs.
With RabbitMQ’s federation plugin, we can tell a broker to transmit messages to another broker (even across a WAN) without clustering. This means brokers can be in different data centers, be part of different applications, have different users or virtual hosts, and even run on different versions of RabbitMQ and Erlang. Messages that are published on an “upstream” exchange will be forwarded to a “downstream” exchange as if the messages were originally published downstream. As well, the downstream federated exchange, doesn’t know the message is coming from a remote upstream server—it just knows a message was published. In the case of Alvaro’s global regions example, he configures the downstream exchange to insert to the data warehouse or lake, and configures the downstream exchange to subscribe to regional upstream exchanges associated with each of the four apps and regions.
This type of federation architecture can be set up where multiple upstream brokers exist and forward to one downstream broker (many to one) or one upstream broker exists and forwards to many downstream brokers (1 to many). These can also be chained across multiple levels or even in loops. So, a similar approach could be applied to any situation where we deal with geographic distribution of data—RabbitMQ could aggregate or distribute any set of any events, like from JMS, to a hierarchy of federated RabbitMQ nodes or disparate applications.
To set up federation, the RabbitMQ federation plugin is used. A downstream broker with the proper credentials is set up to connect or subscribe to an upstream broker, and RabbitMQ uses AMQP in the background to ensure the messages are passed. We can also configure which exchanges or queues to federate within a broker—we don’t have to receive messages from all queues or exchanges. Instead, we can filter messages. Note, this use of federation is related to but different than federated queues.
In his 45 minute talk, Alvaro covers additional information on configuration, parameters, and policies, setting up acknowledgements, reconnects, expirations, and more. He also provides a more detailed explanation in this article with code and command examples.
Since RabbitMQ is multi-protocol, polyglot, open source, and has a plugin architecture, it fits many cloud scenarios. It is used in architectures to connect components within a single application, across applications, and across clouds. RabbitMQ is bundled with cloud technologies like Chef, Puppet, and OpenStack as well as popular, open source cloud operating systems like Fedora, Debian, and Ubuntu. RabbitMQ runs on public clouds like Amazon EC2, Heroku, Joyent, EngineYard, and others. It also runs on private clouds and as part of applications at Google, Mozilla, Nasa, Telefonica, and the NHS. Of course, it also runs on Pivotal CF and the recently announced foundation for Cloud Foundry.
Here are some of the more common use cases for RabbitMQ:
For more information on RabbitMQ: