Steve Conover's blog



Steve ConoverSteve Conover
Chef-solo is great. You might not need client/server.
edit Posted by Steve Conover on Wednesday June 16, 2010 at 10:00PM

You should be doing automated configuration, period. Chef is a great automated configuration tool.

It has to be said, however, that chef has lots of parts, arguably an excess. If you google around for chef intros you see chef-solo referenced as a simple first step into "full" or "real" chef - chef client/server.

On our project we've built a mature web application, we've been using chef for over a year, and have never once felt the need for the client/server model, and we have no reason to expect to.

Here's how we run chef manually:

cd ~/projectroot
git pull
chef/run.sh

(that's it)

run.sh contains:

sudo sh -c "RAILS_ENV=$RAILS_ENV chef-solo -c chef/config/solo.rb -j chef/config/$RAILS_ENV/`hostname -s`.json"

We have capistrano (multi-server ssh tool) do the equivalent on deploy:

sudo [
  "cd #{app_root}",
  "export RAILS_ENV=#{self.variables[:rails_env]}",
  "chef/run.sh"
].join(" && ")

We deploy our code and update system config at the same time.

And that's all we need or want.

Links:

Steve ConoverSteve Conover
"Pivotal News Network" Highlights from May
edit Posted by Steve Conover on Wednesday June 02, 2010 at 07:10AM

The Pivotal News Network has been going strong for six months (Pivots: talk to me if you'd like to share into the feed). Here are some highlights from May:

When starting any software project, there’s an age old argument: should we build something simple that solves our current problem or should we use an existing product that’s more complex, but more feature rich, since we know that’s where we’re going to end up in the future?

...

an oft neglected repercussion of building too much too quickly is that the extra functionality can calcify your product and make it very rigid. Releases become more complex, new features take longer to implement and bugs take longer to fix. You can find yourself a prisoner of your product, maintaining functionality and features that no one ( or very few ) people use. It can demoralize a engineering team, making them more and more susceptible to the nuclear option: the big rewrite.

I think the tendency to lean towards a more exhaustive solution upfront comes from a time when the effort require to change software was much higher than it is today. When systems were written in C, C++, Perl or even Java, making changes was a large undertaking. The thought of possibly throwing away chunks of code was nerve racking. It represented a huge investment in time and money. However, with todays rapid development languages and frameworks like Ruby/Rails & Python/Django, the investment required to create something, both in time and money, is rapidly shrinking.

Jeff [Patton]’s reply shocked me:

“The Ruby community cares about building high-quality apps, but doesn’t necessarily care about shipping high-value apps.”

Jeff went on to say that the Ruby community is obsessive about craftsmanship. This is a good thing, of course. We test. We write clean code. We take the time and care to build applications that are beautiful and do what our customers ask for.

Therein lies the rub: what customers ask for is rarely what they want, and almost never what they need. As Henry Ford put it, “If I had asked what people wanted, they would have said faster horses.” Or as I put it, your customer may pay you $1000 to deliver him a knuckle sandwich, but no amount of precision or strength training is going to leave you with a happy customer.

It turns out that constructing a high-quality application is not enough – you have to conceptualize and design an application that users will actually find useful. Doing this is every bit as difficult as constructing the software, if not harder. It requires a combination of research – generating new ideas from asking questions & identifying problems – and feedback – testing out ideas you’ve created. The Ruby & Agile worlds have been primarily focused on getting user feedback, without doing the all-important research.

Weeks ago, some people in the Ubuntu community got a bit disappointed with the distribution’s core team:

We are supposed to be a community, we all use Ubuntu and contribute to it, and we deserve some respect regarding these kind of decisions. We all make Ubuntu together, or is it a big lie?

We all make Ubuntu, but we do not all make all of it. In other words, we delegate well. We have a kernel team, and they make kernel decisions. You don’t get to make kernel decisions unless you’re in that kernel team. You can file bugs and comment, and engage, but you don’t get to second-guess their decisions. We have a security team. They get to make decisions about security. You don’t get to see a lot of what they see unless you’re on that team. We have processes to help make sure we’re doing a good job of delegation, but being an open community is not the same as saying everybody has a say in everything.

  • from Velocity as a Goal

    From my experience having velocity as a goal doesn't make any difference to the motivation of the team which is often cited as the reason for referring to it as a target. In all the teams I've worked on people are giving their best effort anyway so they can only really have an impact on the velocity by doing one of the following:

    • Working longer hours
    • Cutting corners on quality (by less testing perhaps)
    • Finding a smarter way of working

    ... In reality I haven't noticed that people on the teams I've worked on pay that much attention to whether velocity is considered a target or not. People just do their job and we pretty much always have the same velocity each week regardless.

More popular shared items:

Steve ConoverSteve Conover
Announcing the "Pivotal News Network" RSS Feed
edit Posted by Steve Conover on Saturday November 14, 2009 at 04:00PM

We've pooled some Pivot shared tech news feeds and made this feedburner feed:

http://feeds.feedburner.com/pivotal-news-network

The content is in the spirit of Blabs, so we hope readers here might find it to be useful. See what you think.

Steve ConoverSteve Conover
Dear Lazyweb (RSS-based news page)
edit Posted by Steve Conover on Friday November 13, 2009 at 08:00PM

I have an RSS feed, and I'd like to make a little news page out of it with the ability to post comments. Having done a quick survey of what's available, I'm thinking of doing nothing. But I'd like to find out what others have to say about tools available.

General requirements:

  • I'd like to take an RSS feed and make a nice news page out of it.

  • No mangling or truncating of articles in the feed.

  • It'd be great if I could tie comments in. That might mean, for instance, dropping in disqus.

  • I'd rather not have to write, deploy, or maintain any code. But if there's some set of tools/apis I could easily tie together with code, I'd consider it (some Heroku-type setup, for instance).

Examples of sites I consider to be at least partially what I'm driving at:

If I had to write code, I could imagine using something like feedzirra, sticking it on a many-times-per-hour cron job, and writing out a page and dropping in disqus.

Thoughts?

Steve ConoverSteve Conover
Remixr: Ruby wrapper for the Best Buy Remix API
edit Posted by Steve Conover on Wednesday September 23, 2009 at 08:30AM
sudo gem install remixr

We at Pivotal like that incantation. Thanks to the Squeegee crew for putting Remixr together.

# find stores within 50 miles of ZIP 76227 and products over three G's

stores = client.stores({:area => ['76227', 50]}).products({:salePrice => {'$gt' => 3000}}).fetch.stores

Beautiful.

Steve ConoverSteve Conover
Jeff Hammerbacher: "Hadoop Operations", Velocity 2009 Day One
edit Posted by Steve Conover on Monday June 22, 2009 at 09:15PM

Jeff is Chief Scientist at Cloudera, which helps enterprises with Hadoop implementations.

Hadoop consists of three separate modules, which are apparently in the process of being split into separate Apache projects:

  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Common (aka Hadoop Core)

I'll just mention some of the interesting little tidbits from the presentation:

  • Standard box spec is 1U 2x4core, 8gb ram, 4x1TB SATA 7200rpm.

HDFS:

  • Stores 128mb blocks, replicates the block
  • Good for large files written once and read many times
  • Throughput scales nearly linearly

Some examples of Hadoop-based projects:

  • Avro - cross-language data serialization
  • HBase - like BigTable
  • Hive - SQL interface, an interesting open-source data warehouse solution
  • Zookeeper - coordination service for distributed applications

Hadoop @ Yahoo: 16 clusters, each cluster is 2.5PB and 1400 nodes

Cloudera maintains convenient, stable Hadoop packages - it's all open-source - so you don't have to go around figuring out what version of what subproject works best with others.

Testing: Hadoop has a standalone mode, which uses a single reducer in one JVM.

Jeff mentioned that they use Facebook's Scribe for distributed logging.

And last but not least, Cloudera has a GetSatisfaction page.

Steve ConoverSteve Conover
Steve Sounders: "Web Performance Analysis", Velocity 2009 Day One
edit Posted by Steve Conover on Monday June 22, 2009 at 09:00PM

Quick report from Steve Sounders' workshop at Velocity 2009, current Googler, author of High Performance Websites.

Short version: he has a new brand new book out, and if you're interested in any of the following tips you should probably buy it: Even Faster Web Sites

Resources:

  • cuzillion - model your page and see how various browsers load it using Firebug's Net tab or...

  • httpwatch works in IE and Firefox

  • pagespeed - A little like (YSlow)[http://developer.yahoo.com/yslow/] (Steve uses a combo of YSlow and Pagespeed day-to-day) but gives you a different set of perf information, notably what % of functions in your script are actually invoked in the header, vs afterwards.

  • spriteme A tool that Steve developed and just released, which looks to be a major leap in css sprite-generation technology - i.e. it doesn't just do the (easy) part where all the images get combined together. You get css help, etc.

  • smush.it Uses non-lossy image optimization methods to reduce the number of bytes your images take.

Some tips (I'm assuming these all get better/more elaborate treatments in his book):

  • For over 95% of websites, the vast majority (80%) of the time spent in page load is spent on the front end (i.e. only 10-20% is spent transferring html).

  • Scripts block other elements from downloading. So while js is downloading and executing, nothing else can be downloaded.

  • Typically only 25% of js functions are called before body onLoad (pagespeed helps you see what % this is for you). So one thing to consider when optimizing is lazy-loading the other 75%.

  • There are tricks you can use to pull down scripts in parallel, for instance by creating script tags through document.createElement and attaching to the dom. But there are other techniques, and pitfalls for many of them in different browsers. He goes through the strategy decision tree in the new book.

  • Bad: stylesheet tag followed by an inline script. This stops all the parallel resource loading and forces the browser to only download the js, then continue.

  • Using different domains for assets. A well-known trick. Steve adds that returns diminish around 2-4 domains. Also points out that the browser doesn't care about whether these are actually separate hosts, just that the actual names are different, so you could use a simple CNAME record to make this work with one server.

  • Flush the document early. Particularly header sections (some common images + html). In addition to the raw speed benefit, Google user testing shows this is very positive for user perception - they get visual feedback earlier and have a perception that it's a "fast page".

  • Note that FF 3.5+ contains an interesting new event: MozAfterPaint - a great way to see when the browser decides to repaint parts of the page. See John Resig's post on MozAfterPaint for more.

And don't miss stevesounders.com

Steve ConoverSteve Conover
Great Erector intro
edit Posted by Steve Conover on Monday June 15, 2009 at 06:00AM

by Russell Edens. He has a great take on why Erector is interesting, complete with code examples:

With erector [views] are first class plain old ruby objects. Why is this good? It gives you all the tools of inheritance and mixin's for your views. That is cool. Especially for an application with multiple views of the same underlying models. You can refactor your views into base classes that derive and render the same data in different ways. This is object oriented design for views. Nice.

I've seen object oriented view code in other languages and it leads to some very powerful re-use that all OO programmers can understand. The most ambitious of these attemps was by an HR company ...[that] created their own markup language that was object oriented. The nature of HR data is that it has very complicated rules regarding who can see what data and when. The OO design of the language allowed that to be abstracted to the base classes and a functional programmer simply focused on the problem at hand. They took it further, as all commercial enterprise applications do, and they allowed the customer to define new models and views. Those views were very easy to write with this advanced data access logic abstracted out. Their customers loved it. They wrote very advanced business applications on top of this abstraction.

Views as simple classes, methods, and objects in Ruby - perfect!

Erector Hello World:

class Hello < Erector::Widget
  def content
    html do
      head do
        title "Hello"
      end
      body do
        text "Hello, "
        b "world!"
      end
    end
  end
end

For more see the Erector user guide.

Steve ConoverSteve Conover
Standup blog
edit Posted by Steve Conover on Friday June 05, 2009 at 04:24AM
  • There was a problem uploading files to s3 through Paperclip with # characters in the name (s3 doesn't like # characters). There's a fix on Paperclip trunk, but that hasn't been packaged into a gem. Perhaps the Paperclip people could be convinced to cut a release?

  • One team is seeing files on s3 disappear occasionally. They're using v2 of the s3 api, where the s3 gem uses v1. The team has now turned on s3 logging (which is off by default) - which they recommend everyone turn on as a general good practice.

Steve ConoverSteve Conover
Some Web Ops Resources
edit Posted by Steve Conover on Monday May 25, 2009 at 10:30PM

A laundry list of stuff I've come across / been pointed to lately. What are your favorites (i.e. comments, please)?

General

Book: Scalable Internet Architectures by Theo Schlossnagle

... Theo Schlossnagle's blog

Book: Release It! by Michael Nygard

Book: The Art of Capacity Planning by John Allspaw

Conference: Velocity 2009 in SJ - it's only a month away.

Agile Web Operations blog ... don't miss the post about their Tracker visualization using the Tracker API.

Presentation: Operational Efficiency Hacks by John Allspaw

... plus all of John Allspaw's presentations

John Allspaw's blog, "Kitchen Soap - Thoughts on Capactiy Planning and Web Operations"...(excerpt from latest post: "I can’t tell you how ripped I get when people say things like this: 'cloud computing means getting rid of ops' ")

WebOps Visualizations Flickr Group

Apis / tools:

Chef. This is step one, or as Allspaw puts it "if there's only one thing you do, automated configuration and deployment management should be it". We run chef-solo in our cap deploy. Don't miss Cooking with Chef 101.

Ganglia

RRDTool (Ganglia-related)

God for process management.

Xray for ruby process inspection.

Elif is Perl File::ReadBackwards ported to ruby.

Oldies:

Book: Think Unix

Book: UNIX System Administration Handbook

Book: Essential System Administration

Other articles: