Steve Conover's blog



Steve ConoverSteve Conover
Announcing the "Pivotal News Network" RSS Feed
edit Posted by Steve Conover on Saturday November 14, 2009 at 04:00PM

We've pooled some Pivot shared tech news feeds and made this feedburner feed:

http://feeds.feedburner.com/pivotal-news-network

The content is in the spirit of Blabs, so we hope readers here might find it to be useful. See what you think.

Steve ConoverSteve Conover
Dear Lazyweb (RSS-based news page)
edit Posted by Steve Conover on Friday November 13, 2009 at 08:00PM

I have an RSS feed, and I'd like to make a little news page out of it with the ability to post comments. Having done a quick survey of what's available, I'm thinking of doing nothing. But I'd like to find out what others have to say about tools available.

General requirements:

  • I'd like to take an RSS feed and make a nice news page out of it.

  • No mangling or truncating of articles in the feed.

  • It'd be great if I could tie comments in. That might mean, for instance, dropping in disqus.

  • I'd rather not have to write, deploy, or maintain any code. But if there's some set of tools/apis I could easily tie together with code, I'd consider it (some Heroku-type setup, for instance).

Examples of sites I consider to be at least partially what I'm driving at:

If I had to write code, I could imagine using something like feedzirra, sticking it on a many-times-per-hour cron job, and writing out a page and dropping in disqus.

Thoughts?

Steve ConoverSteve Conover
Remixr: Ruby wrapper for the Best Buy Remix API
edit Posted by Steve Conover on Wednesday September 23, 2009 at 08:30AM
sudo gem install remixr

We at Pivotal like that incantation. Thanks to the Squeegee crew for putting Remixr together.

# find stores within 50 miles of ZIP 76227 and products over three G's

stores = client.stores({:area => ['76227', 50]}).products({:salePrice => {'$gt' => 3000}}).fetch.stores

Beautiful.

Jeff is Chief Scientist at Cloudera, which helps enterprises with Hadoop implementations.

Hadoop consists of three separate modules, which are apparently in the process of being split into separate Apache projects:

  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Common (aka Hadoop Core)

I'll just mention some of the interesting little tidbits from the presentation:

  • Standard box spec is 1U 2x4core, 8gb ram, 4x1TB SATA 7200rpm.

HDFS:

  • Stores 128mb blocks, replicates the block
  • Good for large files written once and read many times
  • Throughput scales nearly linearly

Some examples of Hadoop-based projects:

  • Avro - cross-language data serialization
  • HBase - like BigTable
  • Hive - SQL interface, an interesting open-source data warehouse solution
  • Zookeeper - coordination service for distributed applications

Hadoop @ Yahoo: 16 clusters, each cluster is 2.5PB and 1400 nodes

Cloudera maintains convenient, stable Hadoop packages - it's all open-source - so you don't have to go around figuring out what version of what subproject works best with others.

Testing: Hadoop has a standalone mode, which uses a single reducer in one JVM.

Jeff mentioned that they use Facebook's Scribe for distributed logging.

And last but not least, Cloudera has a GetSatisfaction page.

Quick report from Steve Sounders' workshop at Velocity 2009, current Googler, author of High Performance Websites.

Short version: he has a new brand new book out, and if you're interested in any of the following tips you should probably buy it: Even Faster Web Sites

Resources:

  • cuzillion - model your page and see how various browsers load it using Firebug's Net tab or...

  • httpwatch works in IE and Firefox

  • pagespeed - A little like (YSlow)[http://developer.yahoo.com/yslow/] (Steve uses a combo of YSlow and Pagespeed day-to-day) but gives you a different set of perf information, notably what % of functions in your script are actually invoked in the header, vs afterwards.

  • spriteme A tool that Steve developed and just released, which looks to be a major leap in css sprite-generation technology - i.e. it doesn't just do the (easy) part where all the images get combined together. You get css help, etc.

  • smush.it Uses non-lossy image optimization methods to reduce the number of bytes your images take.

Some tips (I'm assuming these all get better/more elaborate treatments in his book):

  • For over 95% of websites, the vast majority (80%) of the time spent in page load is spent on the front end (i.e. only 10-20% is spent transferring html).

  • Scripts block other elements from downloading. So while js is downloading and executing, nothing else can be downloaded.

  • Typically only 25% of js functions are called before body onLoad (pagespeed helps you see what % this is for you). So one thing to consider when optimizing is lazy-loading the other 75%.

  • There are tricks you can use to pull down scripts in parallel, for instance by creating script tags through document.createElement and attaching to the dom. But there are other techniques, and pitfalls for many of them in different browsers. He goes through the strategy decision tree in the new book.

  • Bad: stylesheet tag followed by an inline script. This stops all the parallel resource loading and forces the browser to only download the js, then continue.

  • Using different domains for assets. A well-known trick. Steve adds that returns diminish around 2-4 domains. Also points out that the browser doesn't care about whether these are actually separate hosts, just that the actual names are different, so you could use a simple CNAME record to make this work with one server.

  • Flush the document early. Particularly header sections (some common images + html). In addition to the raw speed benefit, Google user testing shows this is very positive for user perception - they get visual feedback earlier and have a perception that it's a "fast page".

  • Note that FF 3.5+ contains an interesting new event: MozAfterPaint - a great way to see when the browser decides to repaint parts of the page. See John Resig's post on MozAfterPaint for more.

And don't miss stevesounders.com

Steve ConoverSteve Conover
Great Erector intro
edit Posted by Steve Conover on Monday June 15, 2009 at 06:00AM

by Russell Edens. He has a great take on why Erector is interesting, complete with code examples:

With erector [views] are first class plain old ruby objects. Why is this good? It gives you all the tools of inheritance and mixin's for your views. That is cool. Especially for an application with multiple views of the same underlying models. You can refactor your views into base classes that derive and render the same data in different ways. This is object oriented design for views. Nice.

I've seen object oriented view code in other languages and it leads to some very powerful re-use that all OO programmers can understand. The most ambitious of these attemps was by an HR company ...[that] created their own markup language that was object oriented. The nature of HR data is that it has very complicated rules regarding who can see what data and when. The OO design of the language allowed that to be abstracted to the base classes and a functional programmer simply focused on the problem at hand. They took it further, as all commercial enterprise applications do, and they allowed the customer to define new models and views. Those views were very easy to write with this advanced data access logic abstracted out. Their customers loved it. They wrote very advanced business applications on top of this abstraction.

Views as simple classes, methods, and objects in Ruby - perfect!

Erector Hello World:

class Hello < Erector::Widget
  def content
    html do
      head do
        title "Hello"
      end
      body do
        text "Hello, "
        b "world!"
      end
    end
  end
end

For more see the Erector user guide.

Steve ConoverSteve Conover
Standup blog
edit Posted by Steve Conover on Friday June 05, 2009 at 04:24AM
  • There was a problem uploading files to s3 through Paperclip with # characters in the name (s3 doesn't like # characters). There's a fix on Paperclip trunk, but that hasn't been packaged into a gem. Perhaps the Paperclip people could be convinced to cut a release?

  • One team is seeing files on s3 disappear occasionally. They're using v2 of the s3 api, where the s3 gem uses v1. The team has now turned on s3 logging (which is off by default) - which they recommend everyone turn on as a general good practice.

Steve ConoverSteve Conover
Some Web Ops Resources
edit Posted by Steve Conover on Monday May 25, 2009 at 10:30PM

A laundry list of stuff I've come across / been pointed to lately. What are your favorites (i.e. comments, please)?

General

Book: Scalable Internet Architectures by Theo Schlossnagle

... Theo Schlossnagle's blog

Book: Release It! by Michael Nygard

Book: The Art of Capacity Planning by John Allspaw

Conference: Velocity 2009 in SJ - it's only a month away.

Agile Web Operations blog ... don't miss the post about their Tracker visualization using the Tracker API.

Presentation: Operational Efficiency Hacks by John Allspaw

... plus all of John Allspaw's presentations

John Allspaw's blog, "Kitchen Soap - Thoughts on Capactiy Planning and Web Operations"...(excerpt from latest post: "I can’t tell you how ripped I get when people say things like this: 'cloud computing means getting rid of ops' ")

WebOps Visualizations Flickr Group

Apis / tools:

Chef. This is step one, or as Allspaw puts it "if there's only one thing you do, automated configuration and deployment management should be it". We run chef-solo in our cap deploy. Don't miss Cooking with Chef 101.

Ganglia

RRDTool (Ganglia-related)

God for process management.

Xray for ruby process inspection.

Elif is Perl File::ReadBackwards ported to ruby.

Oldies:

Book: Think Unix

Book: UNIX System Administration Handbook

Book: Essential System Administration

Steve ConoverSteve Conover
Inspect running ruby processes using xray and kill -3
edit Posted by Steve Conover on Friday March 20, 2009 at 06:00PM

We made a code change and deployed to demo, and all the sudden some of our ruby processes were eating a ton of CPU against our full dataset.

In the java world you can send a SIGQUIT to any running java process and get a thread dump. Go ahead, run a java process and kill -3 it.

You can get this in the ruby world by using xray:

sudo gem install xray

Drop this into your code:

require "xray/thread_dump_signal_handler"

Now:

kill -3 <ruby pid>

Look in the log file where you're sending stdout. You'll see something like:

=============== XRay - Done ===============
/usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `call'
    _ /usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `run_machine'
    _ /usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `run'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/backends/base.rb:57:in `start'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/server.rb:150:in `start'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/controllers/controller.rb:80:in `start'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:173:in `send'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:173:in `run_command'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:139:in `run!'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/bin/thin:6
    _ /usr/bin/thin:19:in `load'
    _ /usr/bin/thin:19

(that's thin patiently waiting to service the next request)

A one-line code drop-in results in a powerful new inspection tool. Pretty neat.

For bonus points:

ps ax | grep "thin server" | grep -v grep | awk '{print $1}' | xargs kill -3

For more bonus points, stick this in a capistrano task and grab the thread dumps from your logs, and you'll have a cluster-wide snapshotting tool.

We kill -3'd our CPU-eating thins and discovered a directory scan problem introduced by a recent code change - totally obvious from the thread dump. Now we're nailing it down with a failing perf unit test and fixing the problem.

Steve ConoverSteve Conover
Best Buy Remix @ SXSW
edit Posted by Steve Conover on Saturday March 14, 2009 at 09:00AM

I'm here as part of the Best Buy Remix crew, hanging out in Mashery's Circus Mashimus all weekend. Come by, have a beer, and check out Remix and other interesting API stuff if you're at SXSW.

We're in a room near the front of the convention center, not far from the Pepsi booth.

-Steve

Other articles: