Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

  • Blog Navigation
Why The Open Data Platform Is Such A Big Deal For Big Data

featured-ecosystemToday, fifteen industry leaders in the big data space announced the intent to create a new industry initiative, identified as the Open Data Platform (“ODP”), to promote open source-based big data technologies and standards for enterprises building data-driven applications ( The initial group of member companies include Platinum members GE, Hortonworks, IBM, Infosys, Pivotal, SAS, a large international telecommunications firm, and Gold members AltiScale, Capgemini, CenturyLink, EMC, Splunk, Verizon Enterprise Solutions, Teradata, and VMware.

Born from the playbook Pivotal used just a year ago to leverage open source and open collaboration to accelerate Cloud Foundry into becoming the biggest open source success in recent years, Open Data Platform promises to do the same for the Apache Hadoop® ecosystem and big data, and do it quickly.

Everything Starts With Open Source

Last year, Pivotal scribed its open source manifesto, detailing why open source is pivotal to the success of any technology. From recruiting top talent to accelerating adoption, feedback and innovation, open source has long since proven that no proprietary technology can compete with a viable open source alternative.

However, while single technologies have thrived with open source, ecosystems naturally lag in development without an organizing force. By openly joining forces with the leading vendors, service providers and users of Apache Hadoop® to focus specifically on the needs of the enterprise, the Open Data Platform aims to reduce fragmentation and accelerate developments and innovation across the Hadoop ecosystem.

Open Collaboration: A Rising Tide That Lifts All Boats

A thriving ecosystem is the key for real viability of any technology. With lots of eyes on the prize, the technology becomes more stable, offers more capabilities, and importantly, supports greater interoperability across technologies, making it easier to adopt and use, in a shorter amount of time. By creating a formal organization, the Open Data Platform will act as a forcing function to accelerate the maturation of an ecosystem around Big Data.

Of course, the caliber of the members of the organization are also very important. The members have to have relevant expertise and investment in the area. They also should be looking at the challenges from a variety of angles, balancing the views of consumers of the technologies with providers.  This is why, when we set out to recruit for the Cloud Foundry Foundation, we recruited a variety of tech-savvy companies, from software giants like IBM, EMC and SAP to service providers like Savvis, Rackspace and NTT and industry leading consumers of PaaS like Monsanto, eBay, and BNY Mellon.

For the Open Data Platform, the first wave of members combines heavy-weight brands across Hadoop software providers including EMC, Hortonworks, IBM, Pivotal, Teradata, Splunk and VMware; service providers like AltiScale, CenturyLink, and Verizon Enterprise Solutions; advanced ISV’s like CapGemini, Infosys, and SAS; and, finally, leading Hadoop consumers like General Electric and another large international telco. This is just the first wave, and as an open foundation, we expect to expand the ranks quickly.

Once working under the foundations framework, each of these companies will pool resources and efforts in cooperation, eliminating redundancies and establishing a clear and agreed way for us all to work. Simply put, this creates operational efficiencies across an entire ecosystem. More investment will flow into the standardized open source, and more innovation and interoperability will flow out of the vendors in the ecosystem, accelerating benefits for all.

First Goals for the Open Data Platform Initiative

Translating this into real tactics and benefits, look for significant progress on 3 milestones toward a successful ecosystem in the Open Data Platform’s first year:

  • An industry standard and open data management core. Initially focused on Apache Hadoop®, the Open Data Platform will develop and promote a set of open, enterprise focused Hadoop® standards and technologies. This translates to immediate benefits that will increase stability, capabilities, and compatibility among Hadoop® distributions.
  • Certifying a common reference core.  The Open Data Platform will deliver a certified, packaged, and tested reference core–giving the industry a coveted “test once, use everywhere” solution. With the entire industry enabled to create big data offerings using this reference and consistent implementation, software applications will be more likely to run on any distribution based on the Open Data Platform’s Hadoop® core, reducing risk and vendor lock-in while focusing vendor resources toward more innovation.
  • More support and contributions for the Apache Software Foundation. The Open Data Platform  is expected to be complementary and beneficial to the efforts and stewardship of the Apache Software Foundation (ASF), using the existing ASF processes to contribute code, perform testing, integration, infrastructure support as well as increase participation in events and collaboration with the developer community.

The Future Is Near

Today’s announcement is about an organization that will be created in the near future. However, progress is not waiting for the Open Data Platform to stand itself up. It assembles many partners who are already working together on big data initiatives. GE helped get Pivotal started specifically to tackle modern challenges of combining big data and the Internet of Things (IoT), with results stacking up to save trillions in the next few years. Hortonworks and Pivotal announced today that they will be combining efforts to support Hadoop distributions and partner on data lake technologies. Real code contributions are also prepared, with Pivotal open sourcing our SQL on Hadoop engine called HAWQ, allowing it to run across any distribution of Hadoop based on the Open Data Platform Core.

If the efforts around the Cloud Foundry Foundation are any indicator, announcing the Foundation’s intent to form last February, standing the Foundation up in November, and posting record-breaking first-year open source sales by January of this year, everyone should expect the Open Data Platform to herald in big advances for big data sooner rather than later.

Related Reading:

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

  1. While it’s great to see so many different groups excited about Apache(R) Hadoop(R) software, including the many vendors who contribute employee time and IP to the Apache Hadoop project, it’s a little troubling to see so little mention of the underlying Apache communities that build this great software, and not properly using the full name of the Apache Hadoop project or software product.

    As a brief reminder, Apache, Hadoop, and Apache Hadoop are either trademarks or registered trademarks of The Apache Software Foundation in the US and/or other countries. Our formal trademark policy – which requires that the full name of the project “Apache Hadoop” be used in both the first and most prominent references to our project/product – is posted publicly:

    • Stacey Schneider Stacey Schneider says:

      Thanks for your comments, Shane. The link in the first paragraph got accidentally deleted in edit cycles, and we have restored it now.

      The details for how the ODP will work with the individual underlying communities will be discussed more in the following days and months, for sure! Stay tuned.

  2. The link to in the headline paragraph is incorrect, the sub-domain www doe not resolve – a direct link to does work.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *

Share This