Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
Jeff Hammerbacher: "Hadoop Operations", Velocity 2009 Day One

Jeff is Chief Scientist at Cloudera, which helps enterprises with Hadoop implementations.

Hadoop consists of three separate modules, which are apparently in the process of being split into separate Apache projects:

  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Common (aka Hadoop Core)

I’ll just mention some of the interesting little tidbits from the presentation:

  • Standard box spec is 1U 2x4core, 8gb ram, 4x1TB SATA 7200rpm.

HDFS:

  • Stores 128mb blocks, replicates the block
  • Good for large files written once and read many times
  • Throughput scales nearly linearly

Some examples of Hadoop-based projects:

  • Avro – cross-language data serialization
  • HBase – like BigTable
  • Hive – SQL interface, an interesting open-source data warehouse solution
  • Zookeeper – coordination service for distributed applications

Hadoop @ Yahoo: 16 clusters, each cluster is 2.5PB and 1400 nodes

Cloudera maintains convenient, stable Hadoop packages – it’s all open-source – so you don’t have to go around figuring out what version of what subproject works best with others.

Testing: Hadoop has a standalone mode, which uses a single reducer in one JVM.

Jeff mentioned that they use Facebook’s Scribe for distributed logging.

And last but not least, Cloudera has a GetSatisfaction page.

Comments
Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *