Pivotal is proud to announce a new addition to our Hadoop leadership team, Apache Bigtop founder Roman Shaposhnik. A self-described “Sun guy” who spent over 11 years establishing himself in Sun Microsystem’s Linux and open source culture, Roman is best known in the industry for his later work on the Apache Hadoop project at both Apache Hadoop founder Yahoo! and later at Hadoop start-up Cloudera, where he notably started the Bigtop project—a project credited for improving the packaging and testing of the entire Hadoop ecosystem. He is also a Hadoop committer and member of several Apache Software Foundation projects including the Apache Incubator.
In this Q&A, Shaposhnik shares that he decided to join Pivotal because of our open source DNA, vision for an open source cloud platform, and an ambition bigger than other Hadoop vendors. In his words, “Pivotal is an ideal sponsor for an effort to bring us closer to the fully integrated, easy to use Hadoop platform.”
Q1 | Tell us about you growing up. Where you lived, siblings, school, activities, interests, etc.
A1 | There are two ways to answer that question. Here is an autobiographical version: I was born in Leningrad, USSR, raised by a single mom, and supported (although at times equally annoyed) by a full set of grandparents. My childhood was a pretty typical one for a Soviet geek—math, model airplanes and sci-fi for fun, music lessons and chess for getting ahead in life. At the time when every soviet teenager could build an AM radio receiver out of objects ordinarily found in said teenager’s pockets, personal computers were rare. The homebrew computer club revolution was delayed by the lack of quality electronic components. So, mass-produced programmable calculators were the only game in town. My mom got me the MK-61. It sported a single-row LED display and was meant for calculating the trajectories of ballistic missiles. It was a capable, somewhat boring device, and it was sure to turn me into a meticulous software developer. I came across an article in a hobbyist journal outlining all of the undocumented features it had and how to exploit those for creating adventure games (a single-row LED display, remember?). The article completely blew my mind and turned a wannabe software developer into an aspiring hacker. In an extremely serendipitous turn of events, my first real computer was the soviet clone of the PDP-11. For me, this little device instilled a big love for cleanly designed instruction sets, so it was no surprise that I later gravitated toward the ultimate hacker space later in life—the UNIX culture.
Q2 | You said, “there are two ways to answer the first question.” What is the other answer?
A2 | These days, people ask me ”Where are you from?” I always say, ”I am from Sun Microsystems.” This answer describes me better than any kind of geographic or ethnic affiliation. I am a Sun guy, a UNIX guy, and an open source guy.
Q3 | Tell us about your work background and how Apache Bigtop came to life?
A3 | My first full-time employment was with Sun Microsystems writing a C++ compiler. By joining one of the few fully vertically integrated technology companies, I had a chance to work on projects ranging from CPU instruction scheduling all the way to IDEs and UIs. In my 11 years at Sun, I had the privilege of working with the smartest software hackers on the planet. At the same time, these engineers had a full appreciation for the craft of building enterprise grade solutions. Yes, they were hackers at heart, but, first and foremost, they were engineers. They were the folks who understood and anticipated all the complexities of big system integration. This wisdom ultimately led me to Bigtop. For my last gig at Sun, I worked on Sun’s vision for the compute and storage cloud. At a suggestion of a good friend of mine, Andy Bartlett, I started playing with Hadoop and its ecosystem projects. I was trying to figure out how to leverage those pieces of technology in a cloud. Once Sun was acquired by Oracle in 2009, a group of us continued working on a cloud datacenter project for Huawei. It was fun, but the yellow elephant beckoned. Yahoo! and later Cloudera were the places to be—and now Pivotal.
Let’s get back to the Bigtop question. If you remember the point I made earlier about the importance of system integration, the Hadoop ecosystem lacked it. To me, this was the single, biggest obstacle to world domination. This was the idea behind Apache Bigtop—to be to the Hadoop community what Debian is to Linux—a 100% open, community driven big data management distribution built on Apache Hadoop. Bigtop is a place where we, as a community, get to define the future of the Hadoop platform as a whole.
If Hadoop is to win big, we have to be very careful not to repeat the sad history of the UNIX wars in the late 80s and early 90s. We have to prevent the fracturing of the platform by various vendors and leave enough elbow room for them to meaningfully innovate around it.
The vision of Bigtop borrows heavily from an early vision of Debian and, more recently, the Linux Foundation. In the ideal world, Bigtop will be a vendor neutral place and work on all the common aspects of the platform across different Hadoop vendors. Today, this happens under the umbrella of Apache Software Foundation.
Q4 | How close are you today to realizing this vision?
A4| We definitely have a lot of ground to cover. However, a simple fact exists today—the majority of commercial Hadoop distribution offerings (Cloudera, Hortonworks, Intel, Pivotal, WANDisco, etc.) are derived from Bigtop. So, I believe we are on the right track.
Q5 | What is your definition of success for Bigtop?
A5 | If I ever see Bigtop growing into a type of foundation that employs Doug Cutting the same way that Linux Foundation employs Linus Torvalds, I’d say my work is done. Why? Because, it would make no sense if Linus worked for an individual Linux vendor. The Linux community, including commercial vendors, was mature enough to realize that. So, I really hope the Hadoop community will see the light.
Q6 | What is your role in this new Pivotal organization? What are you most focused on?
A6| At Pivotal, I am managing an Open Source Hadoop platform team. We are just getting started, but our focus is squarely on advancing the state of the entire Hadoop ecosystem and providing a seamless integration between Hadoop APIs/services and the rest of our ultimate offering, the Pivotal One platform. If you are one of those folks who gets ‘the Apache way’ and wants to contribute to Hadoop across various projects, send me your resume. If you also get the mission of Apache Bigtop, make sure to include the name of your favorite drink. I’ll buy you a case.
|>> Want to work with Roman and other Pivotal People? Check out our careers!
Q7 | What are you most excited about being able to do in your new role at Pivotal?
A7 | In my mind, Pivotal is an ideal sponsor for an effort to bring us closer to the fully integrated, easy to use Hadoop platform. First of all, Pivotal is an extremely open source oriented company. Projects like Spring framework, Groovy, and Grails, just to name a few, are the building blocks of Pivotal One. Pivotal has the open source gene baked into its DNA from day one. Even more importantly, the vision of Pivotal One is that of an application development platform offering developers a variety of application and data services available as part of a single comprehensive offering. At the end of the day, Pivotal has a much more ambitious goal compared to all the Hadoop vendors out there. The company puts a huge emphasis on platform integration and views Hadoop APIs in a much wider context of application and data services. The Pivotal One vision is way bigger than any of its parts (even if that part is as big as an entire Hadoop ecosystem).
Of course, while the company history and culture is exciting, possibly the best part is that I get to work with amazing Hadoop engineers Pivotal already has on staff including people like Milind Bhandarkar and Apurva Desai, and their respective teams. The work can be rewarding, but it often comes down to the people you work with that make the process memorable and fun. I am looking forward to what we can achieve together.
Q8 | Scott Yara said that one of the strategic things we are doing for Pivotal One is to address the developer. What does that mean to you?
A8 | It means the Hadoop platform has to integrate well with the rest of Pivotal One. Developers have to be able to develop Pivotal One applications and use whatever platform services are appropriate for the job at hand. Unlike all the other Hadoop distribution vendors, we can’t stop at delivering a rock solid Hadoop platform. Here at Pivotal, we have to make sure that it makes sense as part of Pivotal One. This is how we are going to create real value.
Q9 | Much of this will be delivered as a service through Cloud Foundry. What does that mean for you?
A9 | First and foremost, it means addressing some of the areas where Hadoop as a platform has been traditionally weak—ease of provisioning, configuration and monitoring. The way Hadoop gets delivered has to evolve beyond traditional enterprise delivery methods of operating system packaging and static per-node configuration. These days, developers expect an ‘always on’ service that is secure, reliable, and grows elastically with the needs of their applications. In other words, developers have grown accustomed to consuming technology from a PaaS. We have to improve the Hadoop platform and make it more PaaS friendly in areas like multitenancy, resource management, and isolation. Not only would these improvements make it easier for Pivotal to operate Pivotal One as a public cloud, it would also be more compelling for private datacenter deployments. Finally, we have to integrate well with DataFabric APIs and services. This will unlock the power of Hadoop batch and real time analytics on any conceivable dataset.
Q10 | What is a good example of a project that would help developers use Hadoop APIs on top of Pivotal One?
A10 | My current favorite example here would be something along the lines of Spring XD or Kite SDK. These projects make it easier to build systems on top of Hadoop ecosystem APIs. Today, developers are still too bogged down thinking about plumbing or the low level infrastructure of Hadoop instead of focusing on the business logic of their application.
Q11 | What do you like to do in your personal time when you aren’t living and breathing Pivotal products?
A11 | Euro board games are my vice (a love sparked by a dear friend of mine from Sun, Robert Corbett of GNU Bison fame). I tend to go for a variety rather then mastering any particular one of them, and my collection no longer fits into one half of a walk-in closet. The other half of my walk-in closet is filled with weird species of computing equipment like a soviet PDP-11 clone stolen by KGB in the 70s, a first prototype of the Java Workstation, and a military grade SPARC laptop—stuff like that.
Q12 | What is your favorite developer tool and why?
A12 | UNIX itself is my favorite development environment. I am also a vi guy and lately an InteliJ IDEA convert. Sometimes, I dream what would’ve happened if the world was smart enough to go along with Plan9 vision. What a beautiful development environment all of us would have had.
Q13 | What is top on your bucket list of things to do while still on this little rock we call earth?
A13 | There’s the whole tree planting thing. But, as for the things I can only dream of, I’d have to say—if I could contribute to unlocking the origin of human consciousness, I’d die a happy man. In fact, at that point, I’m not even sure I’d die – I’d probably upload myself to a sufficiently big Pivotal One cluster and start earning my keep as a Consciousness-as-a-Service.