Steve Conover's blog
You should be doing automated configuration, period. Chef is a great automated configuration tool.
It has to be said, however, that chef has lots of parts, arguably an excess. If you google around for chef intros you see chef-solo referenced as a simple first step into "full" or "real" chef - chef client/server.
On our project we've built a mature web application, we've been using chef for over a year, and have never once felt the need for the client/server model, and we have no reason to expect to.
Here's how we run chef manually:
cd ~/projectroot
git pull
chef/run.sh
(that's it)
run.sh contains:
sudo sh -c "RAILS_ENV=$RAILS_ENV chef-solo -c chef/config/solo.rb -j chef/config/$RAILS_ENV/`hostname -s`.json"
We have capistrano (multi-server ssh tool) do the equivalent on deploy:
sudo [
"cd #{app_root}",
"export RAILS_ENV=#{self.variables[:rails_env]}",
"chef/run.sh"
].join(" && ")
We deploy our code and update system config at the same time.
And that's all we need or want.
Links:
- Cooking with Chef 101 - a chef-solo-centric chef introduction
- Chef wiki
The Pivotal News Network has been going strong for six months (Pivots: talk to me if you'd like to share into the feed). Here are some highlights from May:
When starting any software project, there’s an age old argument: should we build something simple that solves our current problem or should we use an existing product that’s more complex, but more feature rich, since we know that’s where we’re going to end up in the future?
...
an oft neglected repercussion of building too much too quickly is that the extra functionality can calcify your product and make it very rigid. Releases become more complex, new features take longer to implement and bugs take longer to fix. You can find yourself a prisoner of your product, maintaining functionality and features that no one ( or very few ) people use. It can demoralize a engineering team, making them more and more susceptible to the nuclear option: the big rewrite.
I think the tendency to lean towards a more exhaustive solution upfront comes from a time when the effort require to change software was much higher than it is today. When systems were written in C, C++, Perl or even Java, making changes was a large undertaking. The thought of possibly throwing away chunks of code was nerve racking. It represented a huge investment in time and money. However, with todays rapid development languages and frameworks like Ruby/Rails & Python/Django, the investment required to create something, both in time and money, is rapidly shrinking.
Jeff [Patton]’s reply shocked me:
“The Ruby community cares about building high-quality apps, but doesn’t necessarily care about shipping high-value apps.”
Jeff went on to say that the Ruby community is obsessive about craftsmanship. This is a good thing, of course. We test. We write clean code. We take the time and care to build applications that are beautiful and do what our customers ask for.
Therein lies the rub: what customers ask for is rarely what they want, and almost never what they need. As Henry Ford put it, “If I had asked what people wanted, they would have said faster horses.” Or as I put it, your customer may pay you $1000 to deliver him a knuckle sandwich, but no amount of precision or strength training is going to leave you with a happy customer.
It turns out that constructing a high-quality application is not enough – you have to conceptualize and design an application that users will actually find useful. Doing this is every bit as difficult as constructing the software, if not harder. It requires a combination of research – generating new ideas from asking questions & identifying problems – and feedback – testing out ideas you’ve created. The Ruby & Agile worlds have been primarily focused on getting user feedback, without doing the all-important research.
Weeks ago, some people in the Ubuntu community got a bit disappointed with the distribution’s core team:
We are supposed to be a community, we all use Ubuntu and contribute to it, and we deserve some respect regarding these kind of decisions. We all make Ubuntu together, or is it a big lie?
We all make Ubuntu, but we do not all make all of it. In other words, we delegate well. We have a kernel team, and they make kernel decisions. You don’t get to make kernel decisions unless you’re in that kernel team. You can file bugs and comment, and engage, but you don’t get to second-guess their decisions. We have a security team. They get to make decisions about security. You don’t get to see a lot of what they see unless you’re on that team. We have processes to help make sure we’re doing a good job of delegation, but being an open community is not the same as saying everybody has a say in everything.
- from Velocity as a Goal
From my experience having velocity as a goal doesn't make any difference to the motivation of the team which is often cited as the reason for referring to it as a target. In all the teams I've worked on people are giving their best effort anyway so they can only really have an impact on the velocity by doing one of the following:
- Working longer hours
- Cutting corners on quality (by less testing perhaps)
- Finding a smarter way of working
... In reality I haven't noticed that people on the teams I've worked on pay that much attention to whether velocity is considered a target or not. People just do their job and we pretty much always have the same velocity each week regardless.
More popular shared items:
7 LESSONS LEARNED WHILE BUILDING REDDIT TO 270 MILLION PAGE VIEWS A MONTH
"The best way to handover work is to leave a broken test for your colleague to fix."
How many lines of code does it take to create the Android OS?
The unimportance of product names (which got smacked down in the Google Reader comments)
We've pooled some Pivot shared tech news feeds and made this feedburner feed:
http://feeds.feedburner.com/pivotal-news-network
The content is in the spirit of Blabs, so we hope readers here might find it to be useful. See what you think.
I have an RSS feed, and I'd like to make a little news page out of it with the ability to post comments. Having done a quick survey of what's available, I'm thinking of doing nothing. But I'd like to find out what others have to say about tools available.
General requirements:
I'd like to take an RSS feed and make a nice news page out of it.
No mangling or truncating of articles in the feed.
It'd be great if I could tie comments in. That might mean, for instance, dropping in disqus.
I'd rather not have to write, deploy, or maintain any code. But if there's some set of tools/apis I could easily tie together with code, I'd consider it (some Heroku-type setup, for instance).
Examples of sites I consider to be at least partially what I'm driving at:
- ycombinator Hacker News
- Tabbloid
- Any blog
If I had to write code, I could imagine using something like feedzirra, sticking it on a many-times-per-hour cron job, and writing out a page and dropping in disqus.
Thoughts?
sudo gem install remixr
We at Pivotal like that incantation. Thanks to the Squeegee crew for putting Remixr together.
# find stores within 50 miles of ZIP 76227 and products over three G's
stores = client.stores({:area => ['76227', 50]}).products({:salePrice => {'$gt' => 3000}}).fetch.stores
Beautiful.
Remix has some exciting upgrades and additions coming soon, keep up with it all via the Remix API blog.
Jeff is Chief Scientist at Cloudera, which helps enterprises with Hadoop implementations.
Hadoop consists of three separate modules, which are apparently in the process of being split into separate Apache projects:
- Hadoop Distributed File System (HDFS)
- MapReduce
- Common (aka Hadoop Core)
I'll just mention some of the interesting little tidbits from the presentation:
- Standard box spec is 1U 2x4core, 8gb ram, 4x1TB SATA 7200rpm.
HDFS:
- Stores 128mb blocks, replicates the block
- Good for large files written once and read many times
- Throughput scales nearly linearly
Some examples of Hadoop-based projects:
- Avro - cross-language data serialization
- HBase - like BigTable
- Hive - SQL interface, an interesting open-source data warehouse solution
- Zookeeper - coordination service for distributed applications
Hadoop @ Yahoo: 16 clusters, each cluster is 2.5PB and 1400 nodes
Cloudera maintains convenient, stable Hadoop packages - it's all open-source - so you don't have to go around figuring out what version of what subproject works best with others.
Testing: Hadoop has a standalone mode, which uses a single reducer in one JVM.
Jeff mentioned that they use Facebook's Scribe for distributed logging.
And last but not least, Cloudera has a GetSatisfaction page.
Quick report from Steve Sounders' workshop at Velocity 2009, current Googler, author of High Performance Websites.
Short version: he has a new brand new book out, and if you're interested in any of the following tips you should probably buy it: Even Faster Web Sites
Resources:
cuzillion - model your page and see how various browsers load it using Firebug's Net tab or...
httpwatch works in IE and Firefox
pagespeed - A little like (YSlow)[http://developer.yahoo.com/yslow/] (Steve uses a combo of YSlow and Pagespeed day-to-day) but gives you a different set of perf information, notably what % of functions in your script are actually invoked in the header, vs afterwards.
spriteme A tool that Steve developed and just released, which looks to be a major leap in css sprite-generation technology - i.e. it doesn't just do the (easy) part where all the images get combined together. You get css help, etc.
smush.it Uses non-lossy image optimization methods to reduce the number of bytes your images take.
Some tips (I'm assuming these all get better/more elaborate treatments in his book):
For over 95% of websites, the vast majority (80%) of the time spent in page load is spent on the front end (i.e. only 10-20% is spent transferring html).
Scripts block other elements from downloading. So while js is downloading and executing, nothing else can be downloaded.
Typically only 25% of js functions are called before body onLoad (pagespeed helps you see what % this is for you). So one thing to consider when optimizing is lazy-loading the other 75%.
There are tricks you can use to pull down scripts in parallel, for instance by creating script tags through document.createElement and attaching to the dom. But there are other techniques, and pitfalls for many of them in different browsers. He goes through the strategy decision tree in the new book.
Bad: stylesheet tag followed by an inline script. This stops all the parallel resource loading and forces the browser to only download the js, then continue.
Using different domains for assets. A well-known trick. Steve adds that returns diminish around 2-4 domains. Also points out that the browser doesn't care about whether these are actually separate hosts, just that the actual names are different, so you could use a simple CNAME record to make this work with one server.
Flush the document early. Particularly header sections (some common images + html). In addition to the raw speed benefit, Google user testing shows this is very positive for user perception - they get visual feedback earlier and have a perception that it's a "fast page".
Note that FF 3.5+ contains an interesting new event: MozAfterPaint - a great way to see when the browser decides to repaint parts of the page. See John Resig's post on MozAfterPaint for more.
And don't miss stevesounders.com
by Russell Edens. He has a great take on why Erector is interesting, complete with code examples:
With erector [views] are first class plain old ruby objects. Why is this good? It gives you all the tools of inheritance and mixin's for your views. That is cool. Especially for an application with multiple views of the same underlying models. You can refactor your views into base classes that derive and render the same data in different ways. This is object oriented design for views. Nice.
I've seen object oriented view code in other languages and it leads to some very powerful re-use that all OO programmers can understand. The most ambitious of these attemps was by an HR company ...[that] created their own markup language that was object oriented. The nature of HR data is that it has very complicated rules regarding who can see what data and when. The OO design of the language allowed that to be abstracted to the base classes and a functional programmer simply focused on the problem at hand. They took it further, as all commercial enterprise applications do, and they allowed the customer to define new models and views. Those views were very easy to write with this advanced data access logic abstracted out. Their customers loved it. They wrote very advanced business applications on top of this abstraction.
Views as simple classes, methods, and objects in Ruby - perfect!
Erector Hello World:
class Hello < Erector::Widget
def content
html do
head do
title "Hello"
end
body do
text "Hello, "
b "world!"
end
end
end
end
For more see the Erector user guide.
There was a problem uploading files to s3 through Paperclip with # characters in the name (s3 doesn't like # characters). There's a fix on Paperclip trunk, but that hasn't been packaged into a gem. Perhaps the Paperclip people could be convinced to cut a release?
One team is seeing files on s3 disappear occasionally. They're using v2 of the s3 api, where the s3 gem uses v1. The team has now turned on s3 logging (which is off by default) - which they recommend everyone turn on as a general good practice.
A laundry list of stuff I've come across / been pointed to lately. What are your favorites (i.e. comments, please)?
General
Book: Scalable Internet Architectures by Theo Schlossnagle
Book: Release It! by Michael Nygard
Book: The Art of Capacity Planning by John Allspaw
Conference: Velocity 2009 in SJ - it's only a month away.
Agile Web Operations blog ... don't miss the post about their Tracker visualization using the Tracker API.
Presentation: Operational Efficiency Hacks by John Allspaw
... plus all of John Allspaw's presentations
John Allspaw's blog, "Kitchen Soap - Thoughts on Capactiy Planning and Web Operations"...(excerpt from latest post: "I can’t tell you how ripped I get when people say things like this: 'cloud computing means getting rid of ops' ")
WebOps Visualizations Flickr Group
Apis / tools:
Chef. This is step one, or as Allspaw puts it "if there's only one thing you do, automated configuration and deployment management should be it". We run chef-solo in our cap deploy. Don't miss Cooking with Chef 101.
RRDTool (Ganglia-related)
God for process management.
Xray for ruby process inspection.
Elif is Perl File::ReadBackwards ported to ruby.
Oldies:
Book: Think Unix
