Matthew Kocher's blog
Ask for Help
"Is there a good JS/HTML code editor that will expand to fix the contents?"
A team is just getting started with putting in a code editor and is currently using ACE. It doesn't seem like there's a way to tell it to take up as much height as it needs. Many suggestions were thrown out, FCK and TinyMCE being two, but none was known to auto size itself. There is one that no one can remember the name of, but they had fond memories of from the past.
Interesting Things
- Super Market Street Sweep is coming up this weekend. Ride your bike around SF for a good cause.
Movember Update
- Webstash passed the $20k mark that Davis & Sean had been shooting for--congratulations everyone! (This does not include Rob's generous Mohawk donation). SF has raised $323.36 per stash, NY $163.40 and Boulder $210.50. There's still time to donate.
Ask for Help
Why does my md5 change when the timezone changes?
A pivot found that the tests of MD5 generation fail if the machine is in a different time zone. While one of the inputs to the hash is the time, he swears up and down that it's integer seconds since the epoch, and is the same on both boxes.
Interesting Things
- Tomorrow is Mo'hawk day at the Pivotal SF Office. It will be live streamed, with color commentary by our own Davis Frank. The live stream may be a PPV event, please have a major credit card available.
Last week Brian Cunnie posted a great writeup of what we've been working on for building OSX Lion workstations. Today, I'd like to introduce Apple Orchard - how we transform those chef recipes into ready to use OSX Lion images.
The story starts a few months ago, one morning after Standup while putting our dishes from breakfast away. Over the past few days we'd been discussing how our ops group would take chef recipes (generated for the most part by developers) and turn them into machine images that they could deploy on a moments notice. I approached Sean Beckett, our Director of Operations, and told him of my vision:
no manual steps
He looked at me like I was crazy, and he was obviously trying to figure out how to talk me down off the ledge. I told him how Jenkins could run a job after every checkin (a fact he was well aware of) and how all it had to do was...... He backed away slowly.
A few weeks later, Brian Cunnie had gotten past the Minimum Viable Image release marker in Tracker, and I told him of my vision:
no manual steps
He also looked at me like I was crazy, which he often does. The next week I had the afternoon to pair with him, and we got to work. We already had Jenkins building our recipes on a mac mini with Deep Freeze (software which allows you to reboot to a clean state), so we copied that Jenkins job at got to work. We got an iMac, and partitioned the disk into two.
Step 1
We boot the image to the persistent side, and image the dynamic side with a "mostly pristine" image that has X Code preinstalled, and has SSH turned on. We then set the machine to boot from that partition, and reboot.
Step 2
When it comes up, we SSH in, upload an SSH key, clone our public and private cookbooks (our private cookbook is used for our site licenses) and run soloist. If it succeeds, we move on to step 3.
Step 3
Reboot the machine to the persistent partition, and wait for it to come up. This isn't part of Step 2 because we want to leave the machine in the dirty state for troubleshooting if the chef run failed, and only trigger this if it succeeds.
Step 4
Put a script in place that will automatically rename the machine when it first boots, take an image of the partition using diskutils, scan it for restore, and move it over to our Deploy Studio server, and create a symlink so the 'lion HEAD' build points to the newly generated image.
That's it. We occasionally promote a 'lion HEAD' build to 'lion STABLE', so that we've always got an image on hand that we're confident in. But the overhead of cutting a new image is now simply changing a symlink.
There are a lot of moving parts, and sometimes it breaks. With time, it's become more reliable, but still has a lot of external dependencies. We've recently been trying out a strategy of pre-populating the chef and homebrew caches, which seem to be helping. Another caveat we've run into is that so far with Lion we've been unable to produce a universal image that will boot both MacBook Airs and iMacs, but we hope this may have changed with the latest 10.7.2 update.
Many thanks to Brian Cunnie - while I was the reason for this madness, he's done most of the heavy lifting with my occasional help.
It's all open source on github, at https://github.com/pivotalexperimental/apple_orchard. Values for our infrastructure are hard coded, but if you'd like to generalize it and use it, fork and make a pull request.
Interesting
The the Riak Client gem uses nethttp by default. While it allows you to specify a timeout for a map reduce job, it doesn't set the nethttp timeout for the connection to riak. This means that all requests are effectively limited to 60 seconds. The project that discovered this is switching to curb.
The Mac App Store is actively hostile to business users. There is no way for us to buy software through the app store without setting up a separate account for every three computers, and then you can only reuse a credit card for three accounts. We won't be purchasing any software through the app store that isn't absolutely necessary until there's a way to purchase N licenses and use those on N machines. Pixelmator is the first app to lose our business.
I spent a large part of this week at Velocity and Devopsdays. I met a lot of great people and heard way more interesting things than I could absorb. Here's a collection of things (in no particular order) I found interesting.
JSFiddle is a gist-list site for sharing HTML/CSS/JS snippets. Looks to be a great way to share useful examples and problems. jQuery uses it for bug reports.
Dyn's DynECT seemed to be the standard answer for how to distribute load to different datacenters for load balancing and geo-targetting. There is no standard answer to defeating the CAP theorem.
There's a huge movement to rebuild Splunk in various open source projects. Logstash is an open source project which is piecing together various open source components to get something similar, but is doing index time extraction instead of search time extraction and hasn't gotten to graphing or analysis yet. Etsy is using Graphite extensively for metrics collection and say that developers prefer it to Splunk. Various other companies are building app-level only metrics collection/reporting solutions. The consensus in the devops community is that Splunk's pricing does work for the web. As a fan of Splunk, I hope they can figure this out as it's still better than anything I've seen.
Etsy's work with metrics collection is interesting - they've written statsD for aggregating things before sending to graphite and Logster for tailing logs and making graphite events out of them.
Joshua Timberman of Opscode has some great slides on how to write a Chef cookbook and I wish I'd gotten to see the talk at Devopsdays. I tend to think they error on the side of premature extraction for reusability, but I see their point of view. I'll definitely be using the remote_file resource and the file_cache_path.
2011 is shaping up to be the year of the Zookeeper alternatives. If you're not familiar, Zookeeper is a reimplementation of Google's Chubby - a highly available and reliable system designed to be the system of record for where services are currently available. Netflix has their own internal solution. Heroku has Doozer, and Noah is the new kid on the block with a simple rest interface and http callbacks(aka: webhooks). Opscode is also trying to answer this need by querying the Chef server at runtime for the identities of other hosts on the network. As environments and servers becoming ephemeral, telling every server about every server gets to be a tedious liability. It'll be interesting to see where this goes.
Run Deck (aside from having a great logo) seems to be an interesting way to give users limitable, auditable, and repeatable access to server shells. It's a web interface that lets you execute commands on various configurable subsets of your infrastructure. It's got plugins so it can grab the current instances out of EC2, and a jenkins plugin so a Jenkins build can trigger a job in Run Deck. I'll definitely be checking it out when I get a chance.
Pipe Viewer - (pv) Looks awesome. You can put it in a series of pipes in a shell, and it'll print out a nice status bar about the amount of data passing through the pipe. 'brew install pv'
Opscode has a "fat" installer of chef coming out. It goes in /opt/opscode and includes all the dependencies in the package. This should answer a common complaint about having to install ruby to install ruby. (Yo, dawg, I heard you like Ruby)
CSS Lint - Nicole Sullivan and Nicholas Zakas teamed up and introduced CSSLint at Velocity, which looks like it'll a good addition to a lot of test suites. Nicole also mentioned some cool pure CSS buttons that work on dark and light backgrounds. Lea Verou's pure css backgrounds were mentioned, which went around the office a while ago but are worth a link. It was also news to me that CSS selectors are run left to right by a browser - "ul.foo span.special *" will match all tags on the page, then narrow it down.
Blue/Green Deployment is all the rage the days, and for good reason. Netflix and Amazon are both using it. You'll have to decide how it fits with your data storage layer, but it's worked well for me in the past and I'm glad to see it gaining popularity.
One of our goals for the first day a project starts at Pivotal is to deliver something the customer can see working. One of the ways we accomplish this is making sure getting up and running with all of our (more) reasonable defaults only takes a few minutes. We've been using guiderails for this internally for a while now, and soft launched it last week. I'm happy to give a full introduction today.
Currently Guiderails supports choosing:
- Mysql or Postgres
- RR or Mocha
- Webrat with Saucelabs support
- Cucumber with Capybara (no suacelabs support)
- SASS (with HAML)
And includes by default:
- A ci_build.sh script for running your project in CI.
- A local git repo
- An rvmrc
- Bundler, auto-tagger, JSON, Heroku, rspec-rails, Jasmine, and Headless gems (in the global or development groups)
- Jasmine initialized for JavaScript testing
- Respec installed
- Some testing related rake tasks
For more details, check it out on github at https://github.com/pivotal/guiderails.
Guiderails is a great way to get going quickly on a project. Many thanks to the Pivots that contributed.
As mentioned in yesterday's standup blog, my pair and I encountered some problems with YAML parsing over the last few days, and now that I think I understand it I wanted to document it for posterity.
Psych is a new YAML parser which presumably is better than what came before it, but can't merge hash keys correctly and doesn't work with delayed job. The merging of hash keys is serious, as our standard databse.yml defines a common section, and back references it merging in individual database name and settings. When psych is loaded, we get a blank database name, which makes active record pretty much useless.
Ruby 1.9.2 optionally compiles psych into ruby if you have libyaml installed on the computer. Some gems will require psych if it's available, thus poisoning any future YAML parsing which does not expect psych's pedantic (and currently broken) behavior.
You can always look at the value of the YAML constant - it can varry between YAML, Syck and Psych, depending on what's loaded. You can switch the yamler by adding YAML::ENGINE.yamler = 'syck', but you need to make sure this happens at every code entry point.
For now, I'm no longer install libyaml and libyaml-devel on servers, which got there because I had been following RVM's information ruby prerequisites. I'll also write a chef recipe to assert that libyaml is not installed.
I've been working on a basic chef solo based rails deployment. I started with Building an AMI, then bootstrapping to get the instance running chef with capistrano. Since part two left us with chef running on a box with recipes, it's time to get a rails app running and serve some requests.
For the moment, I've created an application.rb recipe, which is the only thing in the soloistrc. It declares dependencies:
include_recipe "joy_of_cooking::daemontools"
include_recipe "joy_of_cooking::mysql"
and then gets on with dealing with the app server.
'What is daemontools'? I hear you cry out. it's the best way I've found to ensure processes are running as you expect. If you aren't using daemontools (and you probably aren't) it's worth looking at seriously. Daemontools takes the pids out of process management. There are no pid files to get out of sync with processes, and there's no polling to see if something is up. Daemontools know if something exists because it is a child process, and when it exits it returns control to the supervise process. Deamontools is worth wrapping your head around, but for now, all you need to know is that you give it a script and it runs the script in an infinite loop. I stole the install procedure from Michael Sofaer's Hellspawn, so you'll have to excuse the fact that it looks like ruby instead of chef.
ruby_block "install daemontools" do
block do
directory = "/package/admin"
repo = "git://github.com/MikeSofaer/daemontools.git"
dir_name = "daemontools-0.76"
FileUtils.mkdir_p directory
system("cd #{directory} && git clone #{repo} #{dir_name}")
system("cd #{File.join(directory, dir_name)} && ./package/install")
end
not_if "ls /command/svscanboot"
end
Once daemontools is installed, we move on to installing mysql. A database isn't necessary for an app, but it's necessary for doing anything interesting. I made a philisophical decision to compile mysql from source. I tend to view compiling from source as a continuum - you usually would compile your own app - you usually wouldn't compile your own kernel. The DB is closely related enough to your app that it's nice to have exact documentation about what you're running. It'll also lower the barrier to entry to trying out Percona or Drizzle.
The mysql recipe is a little too long to inline here, but you can and should check it out on github. The recipe starts by installing some dependencies (by all means, leverage your package management system here) create a mysql user and download/cmake/make install. Add a deamontools run script, and up it comes. Set up the users and we're all set. The most interesting part of the recipe to me is the block which waits for mysql to start up:
ruby_block "wait for mysql to come up" do
block do
Timeout::timeout(60) do
until system("ls /tmp/mysql.sock")
sleep 1
end
end
end
end
Usually I'd just throw in a sleep, but I was ashamed to share that with the world. This is a technique I'll carry forward, and would love to see it make its way into chef so those less prone to dropping into ruby could make use of it. (Also, if there's a better way or something that's already in Chef that I haven't come across, I'm all ears)
Once mysql is up and running, we do the git clone/bundle/db:create/db:migrate steps rails developers have come to know and love, then write out a deamontools run script and use svc to make sure the unicorn restarts on every deploy. I'm not a fan of the cap style cached copy/magical rollback that the chef deploy resource reimplements, so for now I'm rolling my own.
Once the app is ready to be started, write out a daemontools run script:
file "/service/unicorn/run" do
content %{#!/bin/bash
cd /var/staging/foo/src
rvm_path=/home/mkocher/.rvm/
export RAILS_ENV=staging
source /home/mkocher/.rvm/scripts/rvm
rvm use ruby-1.8.7-p299@captest
exec /command/setuidgid mkocher rackup -p 3000
}
mode "0755"
end
And daemontools starts up the app. The run script is pretty easy to follow - set up RVM, chose our ruby, and exec unicorn.
Hitting the app on 3000 reveals a rails app in all its scaffolded glory:

There's more work still do to. There's a repetition that needs to be refactored out. Capistrano, Chef and Rails all need to share some knowlege about the world. Cap and chef need to know the directory we're deploying the thing to, chef and rails need to know what the database configuration is, and so on. Cap has settings, Chef has nodes and Rails has Configure blocks. I'm leaning towards reinventing the weel again very simply, but I'm open to suggestions. Other todo items are putting nginx in front of the app server, and moving to a dedicated database server.
I'm not sure which direction this will go next, but I hope by now you're convinced that chef recipes aren't magic - they're just a thin ruby wrapping around the things you do to configure a box - execute some commands and edit some files.
All the recipes mentioned are available on github and MIT licensed, please steal them and make them better.
Notes:
- You'll need to add whatever ports you'd like open to your EC2 security group.
- Having mysql on a AMI backed instance with the data on the local disk is the opposite of production ready.
- Sometimes things in the world changes - either mirror files yourself, or expect the occasional mirror to disappear failing a chef run. Wayne Seguin will also occasionally change the install script location for RVM. Chef recipes are living things which must be tended to occasionally.
Ask for Help
*Is sunspot vulnerable to a ruby injection attack? Sunspot requests ruby as an output format from solr, and evals the response. One project is seeing invalid unicode being passed to solr and coming back in the response, causing the eval error on the invalid characters. The consensus was that it probably wasn't exploitable, but is unfortunate.
- Is there a way to put Jasmine in the test group without it causing errors on production?
This has gotten better in rails 3 but the fix has caused problems in rails 2 apps. For now you can install in every group or catch the exception when it tries to load it in production.
Interesting Things
- You can pass an array as the value of the :join parameter in ActiveRecord finders. This lets your write clearer code instead of having one long string with multiple joins.
Help
- Is anyone getting corrupt or empty gems from rubygems?
Yes. It's not clear what causes it, but it's been seen. The best work around is to have bundler cache the gems in the project.
Interesting Things
- You can define a method in ruby called return. You can send :return to an object with return defined as a function, and it will call the method. You can't call the method normally. None of this is a good idea.
- RVM can use most any version of rubygems you might need. just
rvm rubygems 1.5.2to get 1.5.2, for instance. Some gems are incompatible with new versions of rubygems, so this can come in handy.
