Its pretty much all we are talking about in tech circles these days. That is because the data we have access to actually use is growing at an exponential rate. This year alone, we’ve already collected and stored more data than we did in every year up until 2011 combined. Right now, we are on track to be doubling the entire volume of the world’s stored data in just 18 months.
Late last year, Richard McDougall, VMware’s storage and apps CTO, predicted this would be the year that ‘Delete’ would become a forbidden word. This is a seismic shift in previous data management strategies, where data was archived or purged between 3 and 7 years, depending on the company policy. Part of the reason is storage has become cheap, particularly when we look to some of the economies of scale of putting data into the cloud.
But a bigger reason for this shift is that all data has become useful.
Thanks to new data strategies like MapReduce and Scatter-Gather, we can now federate data across as many cheap commodity servers as we like, and actually get massively better performance as compared to traditional monolithic data stores. This backs up the fact that disk has become cheap enough to justify mass data storage, but more importantly, it means that we can now efficiently query massive amounts of data. It means that all of that data has finally become useful.
In a nutshell, big data is here to stay. Now, what are we doing with those apps? Most people think they are just for social and search. They are not far off, since Hadoop was born out of internet giants like Yahoo! and Facebook. However, now that the economics and the technology have lowered the economic barrier, I believe we will see several new classes of applications emerging.
- Serialized apps. With data volumes exploding, the process of keeping data in synch and the cost associated with replicating terabytes or petabytes of data will become cumbersome. To avoid it, companies will look to new ways to have multiple apps run against a single data store. Historically, this has been a challenge due to dependencies across codebases and data access layers. However, new in-memory data stores, like GemFire, support data serialization and will allow this to happen. So the idea of a single data master is finally not only feasible, its scalable for real-time applications.
- Real-time global transactional applications. Money will likely drive this sector of apps. The idea that banking, purchases or stock trades will be instant and world-wide is critical. Recently I talked to a consultant who worked at a large bank in Australia. He explained how being global meant that people could trade in the morning in Europe, then move to the US and then onto Asia. In order to complete a trade, banks need to perform a credit check in under 10 or 15ms. With latency between Asia and the UK being close to 70ms, local masters of the data must exist just to fight latency. It is imperative to their business to prevent a customer who just drained their account in UK trades from new transactions in Asia. In essence, transaction data will need to follow the sun in real-time. (Editor’s note: I’ll publish this story in more detail as this customer used SQLFire and GemFire for fast ingest and replication).
- Extending legacy data systems. Same as with the mainframes of yesteryear, smart businesses for the next several years will be looking to preserve some of their investments in data. These legacy investments will be built on traditional relational database systems, and they will stop scaling to the demand of modern apps. Same as with the mainframe, we will look to wrap them in services—in this case, data services that pull part of the data out to in-memory caches where reads can happen, saving database I/O for create, update and delete and adding years to your database end of life.
- Video. With over a 100 hours of video being loaded to YouTube every minute, a significant portion of the data mountain we have created is video. The volume alone makes it target for a variety of apps such as large-scale, real-time video analytics which can make real-time facial recognition possible, just like you already see on your favorite TV shows. It will also make traffic pattern analysis possible, with traffic lights being cued to alleviate congestion based on patterns emerging from the video data. I am sure given video has a consumer focus as well, so I am also expecting more rad inventions such as the image recognition software that creates a real-time augmented reality as seen on this Ted Talk.
- Massive Analytics. Scientific research will likely lead this category. One company focused on cancer research, told me that the average human had over 60 million data points science studies now between genes, proteins and anti-bodies. Studies can vary from hundreds to thousands and thousands of participants. That is a lot of data points to organize and sift through. Then there are companies like Aridhia who take into account not only your biological makeup, but real-time updates on clinical trials and patient outcomes to continuously improve targeted therapies. These apps will have to be able to deal with massively complex data sets, and will have to also be flexible and open to create any kind of query on the fly so curious scientists can have room to explore ideas they haven’t had yet.
- Predictive Analytics. GE tagged this as ‘Predictivity’ applications last week in their big announcement that launched 14 new solutions in this class. These are essentially massive analytics that work on a set of algorithms that predict instead of deduce, but will also be built into the Internet of Things. These are jet engines that will tell flight engineers they need servicing while in flight, so they are prepared as soon as the plane lands. They’re applications that will electronically track and schedule medical assets so more patients can be served with the same CT scanner. This is a wind turbine that knows what maintenance it needs, and shows a new technician the video of what needs to be done. These are apps that will tell us how to maintain, manage or schedule resources more efficiently, boosting GDP by a conservative $10-15 trillion dollars in the next 20 years.
Overall, big data will be the focus for the next wave of investment in applications. And it is happening now, with IDC sharing that big data technology and services market is growing at a whopping 31.7%—or about 7x faster than all of IT.
So the only question now is—what’s your next big data initiative going to be?
For more on Pivotal’s Data Fabric solutions see:
- Check out Pivotal’s set of enterprise data products including MPP & Column store database, in-memory data processing and Hadoop.
- Check out Pivotal’s resources, including whitepapers and case studies, on big data, data science and more
- See our other blogs on big data