Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

Jeff Hammerbacher: "Hadoop Operations", Velocity 2009 Day One

Pivotal Labs
Monday, June 22, 2009

Jeff is Chief Scientist at Cloudera, which helps enterprises with Hadoop implementations.

Hadoop consists of three separate modules, which are apparently in the process of being split into separate Apache projects:

  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Common (aka Hadoop Core)

I’ll just mention some of the interesting little tidbits from the presentation:

  • Standard box spec is 1U 2x4core, 8gb ram, 4x1TB SATA 7200rpm.

HDFS:

  • Stores 128mb blocks, replicates the block
  • Good for large files written once and read many times
  • Throughput scales nearly linearly

Some examples of Hadoop-based projects:

  • Avro – cross-language data serialization
  • HBase – like BigTable
  • Hive – SQL interface, an interesting open-source data warehouse solution
  • Zookeeper – coordination service for distributed applications

Hadoop @ Yahoo: 16 clusters, each cluster is 2.5PB and 1400 nodes

Cloudera maintains convenient, stable Hadoop packages – it’s all open-source – so you don’t have to go around figuring out what version of what subproject works best with others.

Testing: Hadoop has a standalone mode, which uses a single reducer in one JVM.

Jeff mentioned that they use Facebook’s Scribe for distributed logging.

And last but not least, Cloudera has a GetSatisfaction page.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Add New Comment Cancel reply

Your email address will not be published.

Pivotal Labs

Pivotal Labs

Recent Posts

  • Does the set of all sets contain itself?
  • Standup 3/8/2012
  • Standup 3/7/2012
Subscribe to Pivotal's Feed

Author Topics

riddles (1)
agile (167)
capistrano (2)
rails (26)
movember (1)
git (10)
railsdoc (1)
object-design (1)
bdd (3)
cucumber (3)
linkedin (1)
oauth (1)
ruby (17)
tdd (2)
lvh.me (1)
rails 3.1.1 (1)
selenium (6)
homebrew (1)
mysql (5)
rvm (1)
sproutcore (1)
paperclip (2)
pry (1)
amazon (1)
heroku (1)
rails3 (2)
jasmine (3)
design (3)
process (12)
productivity (8)
learning (1)
olin (1)
migrations (2)
mongodb (2)
devise (2)
javascript (13)
rubymine (4)
ipad (1)
whurl (1)
head.js (1)
pairing (2)
tools (4)
pair programming (1)
rspec (10)
rspec2 (1)
ruby19 (1)
incubation (3)
startup (5)
api (1)
presenter (1)
vanna (1)
pivotal tracker (5)
capybara (1)
fakeweb (1)
webmock (1)
intern (1)
ruby on rails (25)
meetup (1)
textmate (1)
testing (20)
solr (4)
nyc-standup (11)
community (1)
opensource (3)
activerecord (4)
chrome (1)
mp4 (1)
activeresource (1)
flash (3)
neo4j (1)
nginx (1)
rsoc (1)
meta programming (1)
agile standup (7)
government (3)
webos (4)
xss (1)
jquery (1)
bundler (2)
ci (3)
gems (5)
postgresql (1)
geminstaller (1)
gemcutter (1)
cloud (2)
rack (2)
refraction (1)
gem (5)
refactoring (1)
validations (1)
webrat (1)
engine-yard (1)
firefox (2)
jsunit (1)
mongrel (2)
thin (1)
unicorn (1)
facebook (1)
rubygems (5)
jruby (1)
actioncontroller (1)
rails 2.3 (1)
palmpre (1)
autotest (1)
mac (2)
hosting (1)
goruco (11)
database (3)
railsconf (11)
gogaruco (4)
deployment (4)
github (1)
ie (1)
ajax (1)
intellij (1)
json (1)
asset packaging (1)
polonium (1)
character encoding (1)
utf-8 (1)
test (3)
civics (1)
hpricot (1)
rake (3)
sms (1)
unicode (1)
iphone (1)
java (1)
safari (1)
memory leaks (1)
rr (3)
editor (1)
css (1)
nyc (3)
performance (5)
fun (5)
enterprise rails (1)
health (1)
new and cool (1)
general (2)
treetop (1)
errors (1)
stack (1)
trace (1)
cache (1)
cookies (1)
freesoftware (1)
conferences (1)
development (1)
driven (1)
proxy (1)
caching (1)
peertopatent (1)
languages (1)
rest (2)
rubyforge (1)
sake (1)
file (1)
upload (1)
constants (1)
osx (1)
terminal (1)
pairprogramming (2)
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >