This thread on the Agile Systems Administration group is particularly good:
http://groups.google.com/group/agile-system-administration/browse_thread/thread/7c32b729aaa1079b
There’s some great stuff in here about planning in a highly interrupt-driven environment, and I particularly like Allspaw’s breakdown of ops work at Flickr (the “MumbleMumble” process). Anyone who’s wondering what’s generally involved in making webops go ought to have a look.