Josh Susser's blog
A recent addition to Jeff Dean's auto_tagger is the ability to use an alternate tag ref type instead of standard git tags. I pestered Jeff to add this feature after Scott Chacon explained to me how bad it is to create a large number of tags in git. Thanks, Jeff, for the feature addition!
There are a couple problems with generating autotags as standard git tags. One is that it pollutes the tag namespace which makes it harder to find tags for releases, etc. And it defeats the tags menu in the GitHub UI. The other is that git will automatically sync tags on every fetch and push, which can noticeably slow things down when you have a lot of tags. And it looks like running GitX with thousands of tags can make the app seriously slow and prone to crash.
A little background: A git tag is just a kind of ref in a special namespace. A ref is a file that contains a SHA-1 has identifying a commit, and is an entry point into the big network of blobs in the git object database. It's quite easy to create new kinds of refs; you just put a new directory in the .git/refs dir and go from there. auto_tagger will now do this for you simply by adding one configuration option.
Drop a .auto_tagger options file into your project with these contents:
--ref-path=autotags
auto_tagger will automatically fetch and push tags in a custom namespace when it needs to. You almost never need to look at autotag refs in development, but if you do, you may need to fetch them manually. That's part of the point of using an alternate tag type, avoiding syncing them automatically on every fetch. To manually fetch all the autotags (when using the ref-path=autotags option as above), do
$ git fetch origin refs/autotags/*:refs/autotags/*
I also really like the auto_tagger option to format tag names so they are human readable timestamps by adding a separator character. To make that work, set the date-separator option. Your .auto_tagger options file should look like:
--ref-path=autotags
--date-separator=-
Then your autotags look like "ci/2010-10-09-00-56-21" instead of "ci/20101009005621"
At Pivotal Labs, we spend most of the day pair programming. The typical setup is an Apple iMac with a keyboard and mouse for each developer. We've been using the 24" iMacs for a while, usually with a second 17" display off to the side. But all our new machines are 27" iMacs, and those are so big we don't usually use the second display. The pair sits side-by-side at a desk facing the screen together. Here's what it looks like:

We have had great success pairing this way at Pivotal. But while it is a good setup in a lot of ways, it falls short ergonomically. First off, both people have to sit off to the side of the display, which can cause leaning, slouching, and twisting to get into a position to both see and type. And it's also hard to actually look at your partner without craning your neck around. Even though the desk is wide, keyboards and mice take up room, so there can be a lot of jostling and adjustment on the desktop, and chairs and such can collide as well. Those things aren't terrible, but they do detract from the experience and my spinal fitness.
As you may have heard, this year there aren't any Ruby projects as part of this year's Google Summer of Code. The Ruby community's response to this is a pretty amazing validation of the awesomeness of us! We have created our own Ruby Summer of Code, and raised $100K in just 3 days to sponsor 20 students to work on Ruby open source projects. That's actually a lot more than Google would have sponsored anyway. And Pivotal Labs is one of the six full-project sponsors, woot! We're sponsoring $5K, the amount to cover one student's work full time for the whole summer.
We will no doubt have some pivots volunteering for mentor spots. If you want to volunteer to mentor, you need to apply by the end of this week (April 3rd).
Pivotal Labs is also going to be providing desk space for (a couple?) local students who want to come work in the office for the summer - they'll get to come to our daily standups, eat breakfast with us, attend our tech talks, play Pivot Pong, and just be part of the Pivotal experience. We hope there are some local students who participate in RSoC and that someone comes to hang out with us for the summer. It's also likely that some local students would get to do a report on their projects at the Golden Gate Ruby Conference in September.
Students can apply for spots during the period April 5th-23rd.
Thanks to everyone for stepping up and supporting this great response. This is the kind of thing that makes being a Ruby developer so gratifying, and so fun too.
Rails 3 is now beta and the core team is asking people to try it out and report issues back. We decided to do a small spike to get some experience with the upgrade process and see if we could help identify any problems. The application we worked on was our own Pivotal Pulse CI aggregation display (which you can see in action).
Here's a quick overview of the steps we went through:
- Install Ruby 1.8.7 using RVM
- Install Rails beta gems
- Upgrade the app using the rails_upgrade plugin
- Tweak things a lot
- Drop incompatible dependencies
- Profit!
Yesterday I modified an old app to turn off a feature for records that were more than a certain number of days old. This is an old enough app that the test data is still yaml fixture files, and in the fixtures I see records with attributes like so:
published_at: 2006-08-25 00:00:00 -07:00
created_at: 2006-08-24 00:00:00 -07:00
updated_at: 2006-08-25 00:00:00 -07:00
And of course, the tests code also makes assumptions about absolute dates...
get :show, :year => '2006', :month => '08', :day => '24'
assert_not_nil assigns[:article]
That makes adding tests that need to check for dates relative to the current date very difficult to integrate into the test suite. It also means that over time, your fixture data gets "stale" and you don't have any data that appears to be recent, which may or may not matter to your application.
One way to deal with this is to mock time in your test code, so that tests always run at the same effective time, so the hardwired absolute dates in fixtures are always relatively the same age. I think that's a good Plan B, but I prefer Plan A.
Plan A is to create your test data with dates and times relative to when the tests are being run. You can do that in yaml fixtures by embedding ruby in the yaml:
published_at: <%= 2.days.ago.to_s(:db) %>
created_at: <%= 3.days.ago.to_s(:db) %>
updated_at: <%= 2.days.ago.to_s(:db) %>
And in your test code, be flexible too:
pub = article.published_at.to_date
get :show, :year => pub.year.to_s, :month => pub.month.to_s, :day => pub.mday.to_s
assert_not_nil assigns[:article]
Keep your test dates relative, and a year from now you'll thank yourself (or somebody else will).
This week we moved our gems over to GemCutter. It's very easy to claim the gems that were automatically synced over from RubyForge, but if you have gems on GitHub it takes a little more work. We had several gems to move over, some of which had quite a few versions we wanted to preserve. You can't just push the .gem files over because GitHub built the gem namespaced with your username (e.g. "pivotal-apdex"). So Sam and I built a little script to pull down the gems from gems.github.com, fix the name in the spec, repack, and push up to GemCutter.
You can get the hubcut script at http://gist.github.com/220908
One of the things I've always liked least about building web applications is dealing with mod_rewrite. It's a very useful feature, but it's quirky and the config languages for webservers are difficult to use (at least from my experience with Apache and Nginx). But like it or not, mod_rewrite is often a necessary part of a web app. Until now...
Recently I had to redo the rewrite rules for pivotallabs.com when we switched from Apache to Nginx, which we did when moving to EngineYard's cloud hosting. Since then our Nginx config has grown to over 150 lines, mainly to deal with multiple virtual hosts.
Now, managing a custom Nginx config on the EY cloud system isn't as simple as I'd like, especially when the configs are different on production and demo environments. (Demo is what we call our usual environment for doing feature acceptance.) It's far easier to use the automatically generated config, but that doesn't work when you need to support multiple domain names.
The obvious thing to do was to move the rewrite/redirect logic out of the Nginx config. I found a couple Rack middleware components that did something sort of like what we needed, but none of them were sufficient for what we needed. So we created our own.
Refraction is a Rack middleware replacement for mod_rewrite. With Refraction we were able to replace our 150+ line Nginx config with a 50 line Ruby file, and go back to using the standard automatically generated config on EY cloud.
Here's an example Refraction config file:
Refraction.configure do |req|
feedburner = "http://feeds.pivotallabs.com/pivotallabs"
if req.env['HTTP_USER_AGENT'] !~ /FeedBurner|FeedValidator/ && req.host =~ /pivotallabs.com/
case req.path
when %r{^/(talks|blabs|blog).(atom|rss)$} ; req.found! "#{feedburner}/#{$1}.#{$2}"
when %r{^/users/(chris|edward)/blog.(atom|rss)$} ; req.found! "#{feedburner}/#{$1}.#{$2}"
end
else
case req.host
when 'tweed.pivotallabs.com'
req.rewrite! "http://pivotallabs.com/tweed#{req.path}"
when /([-\w]+.)?pivotallabs.com/
# passthrough with no change
else # wildcard domains (e.g. pivotalabs.com)
req.permanent! :host => "pivotallabs.com"
end
end
end
These rules are extracted from the full config file for pivotallabs.com. They redirect high-traffic syndication feeds to feedburner, rewrite a subdomain (tweed.pivotallabs.com) to a path for that sub-site (pivotallabs.com/tweed), and redirect some aliases to our standard domain name (pivotalabs anyone?).
Refraction is thread-safe, which means you can put it outside the Rack::Lock, something we felt was important for performance. It will never have the performance of mod_rewrite, but it will certainly be better than handling redirections in Rails itself.
Full documentation is available in the README. Contributions welcome.
And of course big thanks to Sam Pierson and Wai Lun Mang who both paired with me on developing Refraction.
When you're moving a codebase from subversion to git, here are a few things that make the move go a bit more smoothly.
In the svn project, you can discover some things you'll need to adjust in git after the import.
Show all files being ignored
svn propget -R svn:ignore .
Add these files to the .gitignore file at your project root, or in appropriate subdirectories. I prefer keeping it all in one place at the top level.
Show all externals
svn propget -R svn:externals .
You'll either have to switch to using a submodule in git, or just pull the files into your project if that's not possible for some reason.
Find all empty directories
find . -type d -empty
touch /path/to/empty/dir/.gitkeep
Since git doesn't keep empty directories, you can add a .gitkeep file to empty directories that you don't want to go away. Some people add a .gitignore file to keep the directory around, but that sounds totally backwards to me. You want to keep it, not ignore it.
By the way, if you are already ignoring dir/*, that will ignore the .gitkeep file as well. Make sure it isn't missed by adding !.gitkeep to the end of your .gitignore file.
Find all authors
If you want to properly attribute commits, you'll need to set up an authors file. But if you miss any authors, the clone will stop and complain. You can discover all the svn users that you need to put in the authors file with this command:
svn log | grep -E 'r[0-9]+ ' | cut -d\ -f3 | sort | uniq
init + fetch > clone
If git svn clone doesn't complete, try doing the init/fetch as separate operations. The clone subcommand is pretty much just doing an init followed by a fetch, but I've found that if the clone fails, doing the commands separately can have better success.
Here's a handy trick for making custom validations easily reusable.
This is an extract from a customer model with three different street addresses, in which we validate all three of the zip codes. (In this code, the GeoState.valid_zip_code? method answers if something that looks like a zip code is an actual zip code - not all five digit combinations are in use as zip codes, and we want to make sure we've got a live one.)
def validate_home_zip_code
validate_zip_code(:home_zip_code)
end
def validate_mailing_zip_code
validate_zip_code(:mailing_zip_code)
end
def validate_previous_zip_code
validate_zip_code(:previous_zip_code)
end
def validate_zip_code(field)
errors.add(field, :inclusion) if errors.on(field).nil? && !GeoState.valid_zip_code?(send(field))
end
validates_presence_of :home_zip_code
validates_format_of :home_zip_code, :with => /^\d{5}(-\d{4})?$/, :allow_blank => true
validate :validate_home_zip_code
validates_presence_of :mailing_zip_code
validates_format_of :mailing_zip_code, :with => /^\d{5}(-\d{4})?$/, :allow_blank => true
validate :validate_mailing_zip_code
validates_presence_of :previous_zip_code
validates_format_of :previous_zip_code, :with => /^\d{5}(-\d{4})?$/, :allow_blank => true
validate :validate_previous_zip_code
That looks very wet to me. (WET == "Write Every Time") But it's not too hard to dry this up using just a tiny bit of knowledge of how ActiveRecord validations work.
Our friends at Engine Yard have just launched the beta of their new cloud hosting product, Flex. If you're familiar with their Solo product you'll find Flex to be pretty similar, just more... flexible. Where solo lets you run your Ruby on Rails application on an Engine Yard stack on an Amazon EC2 instance, Flex lets you run it on a cluster of EC2 instances.
In the last month I've put a handful of applications up on Solo, mostly demo or staging servers for doing story acceptance and release testing. Solo is great for that. After you've gone through the setup process once, you can easily spool up a server for a few hours when you need it on just a few minutes notice, then turn it off when it's not needed anymore. Flex gives you that same kind of adaptability, allowing you to add instances to your cluster to match traffic demands as needed.
Last week I got my first production application running on Flex. The Pivotal Labs company website is now hosted on Flex, and it's humming along quite nicely. There were a few rough spots to work through since we were working with a pre-release product, but I'm pretty happy with our setup now. Since Flex is new, I thought it might be useful to share some of the things my fellow pivots and I learned getting things running there.
