Zach Brock's blog
Ask for Help
Anyone have advice for loading seed data in both regular and test environments? One of our projects has some data that is necessary to bootstrap the app into a working state and they'd like it to be in the database for all their tests as well.
EngineYard Cloud installs a weird version of JRuby. Some of the standard command line options don't seem to work. Anyone have a pointer to a good chef recipe for getting JRuby up and running on EYCloud?
Interesting Things
- Ever get a UTF-8 file with messed up encoding? If all the bytes are still in the right order, try using iconv to fix it. Telling iconv to convert from UTF-8 to UTF-8 fixed a file that had been emailed to one of our projects.
iconv -f UTF-8 -t UTF-8 es.yml > es2.yml
Interesting Things
One of our teams saw a significant (20%+) speedup on their product by switching to Amazon's Cloudfront service. They're using Paperclip and it only took about 20 minutes to switch from S3 to Cloudfront.
If you have an integer column in MySQL that does not allow NULL values and you update a row and set that column to NULL the column ends up being set to 0. This was very surprising to one of our teams.
The project that I'm working on is using Screw.Unit for Javascript testing. We recently ran into a case where we found ourselves copying and pasting some code. We wanted to DRY up our specs and found a neat way to do it that I figured I'd share with everyone. Here's a really simple example to demonstrate how we did it.
Given a cat model that keeps track of the number of lives the cat has left:
function Cat() {
var lives = 9;
this.die = function(num) {
lives = lives - 1
};
this.lives = function(){
return lives;
};
this.isDead = function() {
return lives <= 0
};
}
Lets make up a spec that checks that isDead works for some values of lives:
Screw.Unit(function() {
describe('Cat', function() {
var cat;
describe('isDead', function(){
var shouldNotBeDeadBehavior = function(num){
describe("when the cat has " + num + " lives left", function(){
it("it should not be dead", function(){
cat = new Cat({lives: num});
expect(cat.isDead()).to(be_false);
});
});
}
for (i=3;i>0;i--){
shouldNotBeDeadBehavior.call(Screw.Specification, i);
}
describe("when the cat has 0 lives left", function(){
it("it should be dead", function(){
cat = new Cat({lives: 0});
expect(cat.isDead()).to(be_true);
});
});
});
});
});
The neat part in here is the line:
shouldNotBeDeadBehavior.call(Screw.Specification, i);
If you're not familiar with it, the call function in javascript allows you to define what "this" is for that function call. By calling our shared behavior with Screw.Specification, we're saying that we want to execute this function within the context of the Screw.Unit Specifications. This lets us execute our specs as though they were written in various places. The test results from this spec look like this

This is one way to DRY up some of your Screw.Unit specs. If you find yourself copying and pasting code, consider refactoring the spec out into a shared behavior instead.
Have other good ways to clean up Srew.Unit specs? Share them in the commments!
I was looking for a mocking framework to use with Screw.Unit when I found out that Rajan had ported the spying framework from Jasmine over. His project is called iSpy and we just started using it on my current project. It's worked really well for us and I'd definitely recommend it.
If you're accepting user input for Solr (which I expect most projects using it are), you've probably noticed that you need to sanitize what queries you pass to Solr. After reading a bunch of conflicting documentation and blog posts, I put together a simple little module to handle it for you. It should strip out everything that would cause Solr to throw an error on a query string. Let me know if it works for you or if I missed any corner cases!
module SolrStringSanitizer
ILLEGAL_SOLR_CHARACTERS_REGEXP = /\+|\-|!|(|)|{|}|[|]|\^|\|"|~|*|\?|:|;|&&|\|\|/
def self.sanitize(string)
if string
string.gsub(ILLEGAL_SOLR_CHARACTERS_REGEXP,"")
end
end
end
As has been noted a few times, the new activation model in Firebug 1.4 is kind of psychotic. The main problem seems to be that the Firebug developers have disconnected the formerly linked concepts of "open" and "activated" for a few reasons that make sense to the developers, but make no sense to the users. Unless you've really internalized how Firebug works under the hood (and there are probably a few dozen people like that in the world), Firebug now seems to randomly choose whether it's going to be open or closed when you visit a page. Not fun.
The workaround that we've found is to force firebug to be active on all pages. You can do this by right clicking the Firebug icon in the bottom right and checking "On for all web pages". This makes Firebug behave mostly like the old way where the panel would stay open as you browsed around until you closed it. The big caveat is that it only stays closed if you use the firebug icon in the bottom right of your window, not the close panel button.

This will slow your browser down a lot on ajax heavy sites like Pivotal Tracker or Gmail. You can get around that to a certain extent by disabling the Net tab on those domains, which will keep it from displaying every XHR request.

Hopefully this helps, and hopefully the Firebug 1.4 final release has a saner activation model...
The project I'm working on has a lot of named scopes which are really great. If you're not using them already you should really try them out. Since we test drive everything we do, we needed a really easy way to write tests for all these named scopes. We came up with a little test helper method that I thought I'd share so that other people could use it.
Here's the code:
def test_named_scope(all_objects, subset, condition)
subset.should_not be_empty
subset.each do |obj|
condition.call(obj).should be_true
end
other_objects = all_objects - subset
other_objects.should_not be_empty
other_objects.each do |obj|
condition.call(obj).should be_false
end
end
To use it, just pass a superset of objects, the subset you want to test and then a lambda as a condition. The lambda should be true for all items in the subset and false for all the items outside of it.
It sounds complicated but it's really easy! Here's an example Let's look at a simple tag class that has a status column indicating whether the tag is on a whitelist or a blacklist. It could look like this.
class Tag < ActiveRecord::Base
WHITELISTED = 1
BLACKLISTED = 0
end
We want to be able to easily grab all the whitelisted tags, so we need to add a named scope.
Here's the spec we write first:
describe Tag do
describe "whitelisted named_scope" do
it "returns the whitelisted tags" do
test_named_scope(Tag.all, Tag.whitelisted, lambda{|tag|
tag.status == Tag::WHITELISTED })
end
end
end
end
We run the spec, watch it fail and then go add the named scope to our Tag class.
class Tag < ActiveRecord::Base
WHITELISTED = 1
BLACKLISTED = 0
named_scope :whitelisted, :conditions => {:status => WHITELISTED}
end
Then we just rerun the spec and watch it pass. Easy!
Update2: Josh Susser emailed me a really nice refactoring with the enumerable partition method and Kelly fixed a bug I introduced.
def test_named_scope(all_objects, subset, condition)
scoped_objects, other_objects = all_objects.partition(&condition)
scoped_objects.should_not be_empty
other_objects.should_not be_empty
scoped_objects.should == subset
other_objects.should == all_objects - subset
end
Lightning Talks
Bosco is introducing the speakers. Come to the ruby meetup!
Jeff Smick - Blather
- Simpler XMPP
- Make XMPP4R easier
- Requires libxml-ruby and EventMachine
- simple DSL -Handlers for ready, error, message, presence, iq
- Guards route stanzas
- Guards can be symbols, hashes with string, hashes with regexs, lambdas or arrays
- PubSub is in the works and coming next
Tim Connor - Rack Middleware build, init call cycle
- based on the sinatra flash plugin
- Wanted to remove sinatra from it
- Found that every time you say "use" you are creating a lambda which will create an app reference
- You can check out his Rack::Flash
Wolfram Arnold - What's Cool about cache money?
- Nick Kallen wrote the original Cache Money
- Backed by Memcached
- Abstracts away the caching between the code and the database so you don't have to worry about it.
- Can just do User.find instead of User.get_cache
- named scope, has_many, etc will all work transparently
- Can almost use it as a drop in to add caching
- Cache Money doesn't support joins but check out acts_as_most_popular
Yehuda Katz - Moneta
- Moneta is allows you to create objects that behave like hashes backed by any format you want
- Behaves just like a ruby hash
- Has adapters for BerkeleyDB, Datamapper, Memcached, S3, xattr, rufus and more
- Easy to write new adapters
Andy Delcambre - Datamapper Adapters
- Making it easier to write datamapper adapters
- Wrote an adapter for github repos today
- Demonstrating pulling down Github repos and searching with datamapper syntax
Brief interlude trying to figure out why the projector was not working
Erik Michaels-Ober - Merb Admin App
- like active scaffold for rails, this is for merb
- modeled after Django site admin uses their css and js python manage.py runserver
- introspects your model to display form elements intelligently
- adds a generator for adding new admin
- not up on github yet try sferik on twitter
Mislav Marohnić - RSPACTOR for continuous tests on OSX (& more!)
- "make it green then make it clean"
- autotest
- Problems
- one big file
- awkward growl integration
- pitfalls when using rspec-rails plugin
- polling - uses 25% of cpu when idling :(
- Problems
- Original RSpactor written by Andreas Wolff
- OS X only because it uses filesystem events
- made for Rspec + Growl integration
- Console tool
- but abandonded :(
- Mislav-RSpactor
- cleaner, more modular, default mappings for usual directory structure
- better mappings if its a rails project
- tested!
- so modular you can reuse the Listener if you want to listen for mac file system events
- uses a lot less CPU
Possibilities
- running related tests while you TDD
- compile Haml/Sass for static sites
- trigger javascript sprockets build
- render RDoc output while you edit comments
- it can be run for all projects in your filesyste, you just opt each project in
Bryan Helmkamp - Rack::Bug
- Rack middleware
- inspired by Django debug toolbar
- Modular, can be used for any rack app
- Panels
- rails env
- rails response time (cpu time)
- request vars (session cookies, rack env)
- keeps track of SQL queries - shows backtrace on queries, explain for queries
- count of ActiveRecord instantiations on the page using Oink
- can look in Memcache cache
- template traces (times for rendering)
- aggregates all Rails log entries
- KB delta for process size of Ruby during a single request
- runs on production environments, password protected
- instruments using alias_method_chain hacks
- Working with Yehuda Katz on Orchestra to someday soon simplify it
Pat Nakajima - No more Keynote with Slidedown
- Speaks at NYCrb meetup, and didn't want to use Keynote
- Wanted to use Markdown but also wanted syntax highlighting
- Generates an HTML page that you can use to run your presentation
- The Maker's Mark library was extracted to do easy syntax highlighting in Markdown
Chris Lee - Floxee - OS Twitter Dir
- open source twitter dirctory application
- tweet congress
- directory of tweets from members of congress
- Floxee on Github
Max - PaMP: Privacy-aware Marketplace
- From IBM Almaden Research Labs
- privacy-aware market place
- Goals
- to develop a platform that allows users to manage their privacy settings across social network -reducing the cognitive burden on a user; leveraging the wisdom of his crowd
- Maps to opensocial, etc
Andrew Cantino - SelectorGadget
- No time :(
- "SelectorGadget is an open source bookmarklet that makes CSS selector generation and discovery on complicated sites a breeze."
Kyle Maxwell - Parsley
- No time :(
- "Parsley is a simple to use and elegant language for creating HTML and XML parsers"
- "Parsley can be used from Ruby, Python, C/C++, and the *nix command-line."
CloudKit: Hacking the Open Stack with Ruby and Rack - Jon Crosby
Intro
Thanks for the votes, his talk is here because of GoGaRuCo attendee votes.
He works for Engineyard, and they are hiring.
This talk will be "lightning-talk" style, so that means it will be very fast (and also means this live-blog will be pretty sparse)
Cloudkit
Cloud Kit is an Open Web JSON Appliance Can quickly and easily spin up an API for RESTful Collections of JSON Documents
Similiar to CouchDB and Perservere Implemented in Ruby (unlike CouchDB)...
Now Frameworks are basically another MVC framework
So why wouldn't you want to do a new MVC architecture?
gem install cloudkit
Radar
"If your RESTFUL API cannot be accessed with curl, you lose"
Resource Composition in the Browser
If you have two widgets in the browser doing different tasks, you can point them at different resources. Example: 280Slides Example: SproutCore
Mobile apps can benefit from this style of restful architecture as well.
ESI caching layers - like Old Skool SSI, except that they are cache includes.
Cloudkit is built on Rack. Rack is awesome.
HTTP Intermediaries - such as Rack Middleware. Rack Is The Web The spec for rack middleware is runnable and readable
Build an App! create config.ru require 'cloudkit' expose :todos, :profiles
Cloudkit bootstraps so you can query it You can ask it for it's Options and it'll tell you what you can do with it
Hypermedia as the Engine of Application State
Cloudkit is read-optimized
No SQL, no ORM, uses Tokyo Cabinet Tables instead
Schema Free, HTTP and JSON are the schema
Can do a PUT to place a new record at a specific location
Can do POST to update. By supplying the version etag the server can solve the "lost update" problem
Auto-versioning, any time you update a resource, the previous version is archived. That's reflected in the url - :collection/:version. This is solves the last-update problem when 2 users update the same document at once. If you try to update a resource without providing the version, it will return 400 bad request. If two clients try to update the same version, the seconds get 412 precondition mismatch response.
Cloudkit also solves the batch GET problem, where you can access the resource with id "_resolved" to get multiple documents at once (and their complete contents).
Finally, with DELETE, you can't delete things that out of date, similar to update. The 410 Gone response will get returned in this case.
"Rewrite in Scala... or solve the problem"
What's missing? The ability to ask questions Pagination Querying - solved with JSONQuery. (/todos[0:10][?priority=3])
jQuery plugin for Cloudkit
All code is up at Jon's Github
Because it's OpenWeb, you can easily add OAuth, OpenID, etc. A desktop application might use OAuth, whereas a web application could use OpenID for authentication.
Q: Isn't querying slow? A: Yeah, it can be slow. There's indexing work that needs to be done on write to optimize read. Tokyo Cabinet might come to the rescue here about searching data with regular expressions.
Q: Are there real world apps using cloudkit? A: Not that I know of. One company might be trying it.
Q: What kind of apps are good for cloudkit? A: I'm personally using it for Actiontastic, a synchronizing web service that provides a REST interface.
Q: Are there plans to abstract away the key/value storage system so other systems can be used? A: Yehuda has a library called Moneta that's an abstraction for Key/value stores that I'd like to move to.
Q: How does CouchDB map/reduce company to cloudkit's JSONQuery? A: It first started as a Sinatra app that sat between couchDB, but I found JSONQuery to be better suited.
Intro
Hypertable and Rails: DB Scaling Solutions with HyperRecord
Links: Hypertable HyperRecord
Rusty is from Zvents, a local search engine
Presentation
Showing example of hourly data for the last month for a single event
Old benchmark was over 1M rows inserted per second sustained
Hypertable is an open-source implementation of Google's BigTable.
Hypertable is a Column-Oriented DBMS
Data Model 5-part key: Row Key Column Family Column Qualifier Timestamp Revision
One index per table (on the row key) Only stores strings
Architecture Master server - tracks range servers and where data is stored (spare master is also usually run, as it's a single point of failure) Range servers - data is broken up into individual range servers Hyperspace - Handles locking and master recovery HDFS - Stores redundant copies of data
ThriftBroker - An RPC wrapper for Hypertable for many languages using the Thrift Wrapper
HyperRecord
HyperRecord is a subclass of ActiveRecord for Hypertable Supported by the Hypertable
Example Loading data into simple pages app Loading first 10,000 articles of wikipedia 150MB of data infiled in 14 seconds Loads all the data into a rails scaffold and browses it
Design considerations Denormalization - can't do joins so you have to put your data in an appropriate format for querying. Can use MapReduce to interact with data. Column families/qualifiers - You can store data in the key part of the key value pair Revisions - deletes are represented as inserted delete cells
Questions
Q: How do you break down data by hours in example
A: Broken down by Ruby and aggregated
Q: It looks like the keys in that list were strings, not timestamps, did you have to take the timestamp and convert it to a string yourself?
A: Pretty much
Q: Did the wikipedia articles contain any of the sub-data like images, links, etc?
A: No, just a sql dump as a demo of querying the database through a rails scaffold
Q: Does hypertable select support SQL limits, order, etc?
A: HQL supports a lot of things you'd expect from SQL, but it's still somewhat limited.
Q: What do you do with it?
A: We store all of our log data and process it using Cascading to gather hourly data for all our pages. We then put it in Hypertable so we can query it quickly to generate reports.
Rusty: Cascading is Java code You can easily construct complicated MapReduce jobs using it
Josh: Some other uses of Hypertable at Zvents Changelog We deal with a lot of user created content, and things change often and we don't always know what We log everything that ever happens to our data so that we can track everything that happens to our data. From uploaded images to deleted links to edited descriptions, we can see what changed, when and how.
Zvents and Baidu are the primary sponsors of the Hypertable project. Hypertable and HyperRecord are both on Github.
Hypertable development started 2 years ago as a forward looking solution to analytics problems.
The search problem for Zvents is many dimensional: Time, Location, Description, User Data and User Behavior and Hypertable is a way to inform a lot of that data.
Q: What kind of problems are well suited to HyperTable A: We're trying to move our entire site over. A canonical example for this kind of database is a crawl database. A2: Anything where you have mountains and mountains of data and want to query over it.
Example of Crawl Database stored in Hypertable.






















