Zach Brock's blog



Zach BrockZach Brock
Standup 04/15/2010 - iconv and EY+JRuby
edit Posted by Zach Brock on Thursday April 15, 2010 at 03:06PM

Ask for Help

  • Anyone have advice for loading seed data in both regular and test environments? One of our projects has some data that is necessary to bootstrap the app into a working state and they'd like it to be in the database for all their tests as well.

  • EngineYard Cloud installs a weird version of JRuby. Some of the standard command line options don't seem to work. Anyone have a pointer to a good chef recipe for getting JRuby up and running on EYCloud?

Interesting Things

  • Ever get a UTF-8 file with messed up encoding? If all the bytes are still in the right order, try using iconv to fix it. Telling iconv to convert from UTF-8 to UTF-8 fixed a file that had been emailed to one of our projects.
    iconv -f UTF-8 -t UTF-8 es.yml > es2.yml

Zach BrockZach Brock
Standup 04/14/2010 - MySQL and Cloudfront
edit Posted by Zach Brock on Thursday April 15, 2010 at 03:00PM

Interesting Things

  • One of our teams saw a significant (20%+) speedup on their product by switching to Amazon's Cloudfront service. They're using Paperclip and it only took about 20 minutes to switch from S3 to Cloudfront.

  • If you have an integer column in MySQL that does not allow NULL values and you update a row and set that column to NULL the column ends up being set to 0. This was very surprising to one of our teams.

Zach BrockZach Brock
Shared Behaviors in Screw.Unit or how to DRY up your javascript specs
edit Posted by Zach Brock on Thursday August 27, 2009 at 10:07PM

The project that I'm working on is using Screw.Unit for Javascript testing. We recently ran into a case where we found ourselves copying and pasting some code. We wanted to DRY up our specs and found a neat way to do it that I figured I'd share with everyone. Here's a really simple example to demonstrate how we did it.

Given a cat model that keeps track of the number of lives the cat has left:

function Cat() {
    var lives = 9;
    this.die = function(num) {
        lives = lives - 1
    };
    this.lives = function(){
        return lives;
    };
    this.isDead = function() {
        return lives <= 0
    };
}

Lets make up a spec that checks that isDead works for some values of lives:

Screw.Unit(function() {
    describe('Cat', function() {
        var cat;
        describe('isDead', function(){
            var shouldNotBeDeadBehavior = function(num){
                describe("when the cat has " + num + " lives left", function(){
                    it("it should not be dead", function(){
                        cat = new Cat({lives: num});
                        expect(cat.isDead()).to(be_false);
                    });
                });
            }

            for (i=3;i>0;i--){
                shouldNotBeDeadBehavior.call(Screw.Specification, i);
            }

            describe("when the cat has 0 lives left", function(){
                it("it should be dead", function(){
                    cat = new Cat({lives: 0});
                    expect(cat.isDead()).to(be_true);
                });
            });
        });

    });
});

The neat part in here is the line:

shouldNotBeDeadBehavior.call(Screw.Specification, i);

If you're not familiar with it, the call function in javascript allows you to define what "this" is for that function call. By calling our shared behavior with Screw.Specification, we're saying that we want to execute this function within the context of the Screw.Unit Specifications. This lets us execute our specs as though they were written in various places. The test results from this spec look like this screw unit screenshot

This is one way to DRY up some of your Screw.Unit specs. If you find yourself copying and pasting code, consider refactoring the spec out into a shared behavior instead.

Have other good ways to clean up Srew.Unit specs? Share them in the commments!

Zach BrockZach Brock
Mocking ScrewUnit with iSpy
edit Posted by Zach Brock on Wednesday August 19, 2009 at 10:00PM

I was looking for a mocking framework to use with Screw.Unit when I found out that Rajan had ported the spying framework from Jasmine over. His project is called iSpy and we just started using it on my current project. It's worked really well for us and I'd definitely recommend it.

Zach BrockZach Brock
Sanitizing Solr requests
edit Posted by Zach Brock on Friday July 17, 2009 at 01:29PM

If you're accepting user input for Solr (which I expect most projects using it are), you've probably noticed that you need to sanitize what queries you pass to Solr. After reading a bunch of conflicting documentation and blog posts, I put together a simple little module to handle it for you. It should strip out everything that would cause Solr to throw an error on a query string. Let me know if it works for you or if I missed any corner cases!

module SolrStringSanitizer
  ILLEGAL_SOLR_CHARACTERS_REGEXP = /\+|\-|!|(|)|{|}|[|]|\^|\|"|~|*|\?|:|;|&&|\|\|/

  def self.sanitize(string)
    if string
      string.gsub(ILLEGAL_SOLR_CHARACTERS_REGEXP,"")
    end
  end
end

Zach BrockZach Brock
How to make Firebug 1.4 behave sanely
edit Posted by Zach Brock on Monday July 06, 2009 at 11:33PM

As has been noted a few times, the new activation model in Firebug 1.4 is kind of psychotic. The main problem seems to be that the Firebug developers have disconnected the formerly linked concepts of "open" and "activated" for a few reasons that make sense to the developers, but make no sense to the users. Unless you've really internalized how Firebug works under the hood (and there are probably a few dozen people like that in the world), Firebug now seems to randomly choose whether it's going to be open or closed when you visit a page. Not fun.

The workaround that we've found is to force firebug to be active on all pages. You can do this by right clicking the Firebug icon in the bottom right and checking "On for all web pages". This makes Firebug behave mostly like the old way where the panel would stay open as you browsed around until you closed it. The big caveat is that it only stays closed if you use the firebug icon in the bottom right of your window, not the close panel button.

On for all web pages

This will slow your browser down a lot on ajax heavy sites like Pivotal Tracker or Gmail. You can get around that to a certain extent by disabling the Net tab on those domains, which will keep it from displaying every XHR request.

disable net panel

Hopefully this helps, and hopefully the Firebug 1.4 final release has a saner activation model...

Zach BrockZach Brock
An easy way to write named scope tests
edit Posted by Zach Brock on Thursday June 25, 2009 at 03:12AM

The project I'm working on has a lot of named scopes which are really great. If you're not using them already you should really try them out. Since we test drive everything we do, we needed a really easy way to write tests for all these named scopes. We came up with a little test helper method that I thought I'd share so that other people could use it.

Here's the code:

def test_named_scope(all_objects, subset, condition)
  subset.should_not be_empty
  subset.each do |obj|
    condition.call(obj).should be_true
  end

  other_objects = all_objects - subset
  other_objects.should_not be_empty
  other_objects.each do |obj|
    condition.call(obj).should be_false
  end
end

To use it, just pass a superset of objects, the subset you want to test and then a lambda as a condition. The lambda should be true for all items in the subset and false for all the items outside of it.

It sounds complicated but it's really easy! Here's an example Let's look at a simple tag class that has a status column indicating whether the tag is on a whitelist or a blacklist. It could look like this.

class Tag < ActiveRecord::Base
   WHITELISTED = 1
   BLACKLISTED = 0
 end

We want to be able to easily grab all the whitelisted tags, so we need to add a named scope.

Here's the spec we write first:

describe Tag do
    describe "whitelisted named_scope" do
      it "returns the whitelisted tags" do
        test_named_scope(Tag.all, Tag.whitelisted, lambda{|tag|
                                     tag.status == Tag::WHITELISTED })
      end
    end
  end
end

We run the spec, watch it fail and then go add the named scope to our Tag class.

class Tag < ActiveRecord::Base
  WHITELISTED = 1
  BLACKLISTED = 0
  named_scope :whitelisted, :conditions => {:status => WHITELISTED}
end

Then we just rerun the spec and watch it pass. Easy!

Update2: Josh Susser emailed me a really nice refactoring with the enumerable partition method and Kelly fixed a bug I introduced.

def test_named_scope(all_objects, subset, condition)
  scoped_objects, other_objects = all_objects.partition(&condition)
  scoped_objects.should_not be_empty
  other_objects.should_not be_empty
  scoped_objects.should == subset
  other_objects.should == all_objects - subset
end

Zach BrockZach Brock
GoGaRuCo '09 - Lightning Talks
edit Posted by Zach Brock on Sunday April 19, 2009 at 12:29AM

Lightning Talks

Bosco is introducing the speakers. Come to the ruby meetup!

Jeff Smick - Blather

GoGaRuCo '09 - Lightning Talks - Jeff Smick - Blather

  • Simpler XMPP
  • Make XMPP4R easier
  • Requires libxml-ruby and EventMachine
  • simple DSL -Handlers for ready, error, message, presence, iq
  • Guards route stanzas
    • Guards can be symbols, hashes with string, hashes with regexs, lambdas or arrays
  • PubSub is in the works and coming next

Tim Connor - Rack Middleware build, init call cycle

GoGaRuCo '09 - Lightning Talks - Tim Connor - Rack Middleware build, init call cycle

  • based on the sinatra flash plugin
  • Wanted to remove sinatra from it
  • Found that every time you say "use" you are creating a lambda which will create an app reference
  • You can check out his Rack::Flash

Wolfram Arnold - What's Cool about cache money?

GoGaRuCo '09 - Lightning Talks - Wolfram Arnold - What's Cool about cache money?

  • Nick Kallen wrote the original Cache Money
  • Backed by Memcached
  • Abstracts away the caching between the code and the database so you don't have to worry about it.
  • Can just do User.find instead of User.get_cache
  • named scope, has_many, etc will all work transparently
  • Can almost use it as a drop in to add caching
  • Cache Money doesn't support joins but check out acts_as_most_popular

Yehuda Katz - Moneta

GoGaRuCo '09 - Lightning Talks - Yehuda Katz - Moneta

  • Moneta is allows you to create objects that behave like hashes backed by any format you want
  • Behaves just like a ruby hash
  • Has adapters for BerkeleyDB, Datamapper, Memcached, S3, xattr, rufus and more
  • Easy to write new adapters

Andy Delcambre - Datamapper Adapters

GoGaRuCo '09 - Lightning Talks - Andy Delcambre - Datamapper Adapters

  • Making it easier to write datamapper adapters
  • Wrote an adapter for github repos today
  • Demonstrating pulling down Github repos and searching with datamapper syntax

Brief interlude trying to figure out why the projector was not working

GoGaRuCo '09 - Lightning Talks - four developers trying to debug a bad display adapter

Erik Michaels-Ober - Merb Admin App

GoGaRuCo '09 - Lightning Talks - Erik Michaels-Ober - Merb Admin App

  • like active scaffold for rails, this is for merb
  • modeled after Django site admin uses their css and js python manage.py runserver
  • introspects your model to display form elements intelligently
  • adds a generator for adding new admin
  • not up on github yet try sferik on twitter

Mislav Marohnić - RSPACTOR for continuous tests on OSX (& more!)

GoGaRuCo '09 - Mislav Marohnić - RSPACTOR for continuous tests on OSX (& more!)

  • "make it green then make it clean"
  • autotest
    • Problems
      • one big file
      • awkward growl integration
      • pitfalls when using rspec-rails plugin
      • polling - uses 25% of cpu when idling :(
  • Original RSpactor written by Andreas Wolff
    • OS X only because it uses filesystem events
    • made for Rspec + Growl integration
    • Console tool
    • but abandonded :(
  • Mislav-RSpactor
    • cleaner, more modular, default mappings for usual directory structure
    • better mappings if its a rails project
    • tested!
    • so modular you can reuse the Listener if you want to listen for mac file system events
    • uses a lot less CPU Possibilities
      • running related tests while you TDD
      • compile Haml/Sass for static sites
      • trigger javascript sprockets build
      • render RDoc output while you edit comments
    • it can be run for all projects in your filesyste, you just opt each project in

Bryan Helmkamp - Rack::Bug

GoGaRuCo '09 - Lightning Talks - Bryan Helmkamp - Rack::Bug

  • Rack middleware
  • inspired by Django debug toolbar
  • Modular, can be used for any rack app
  • Panels
    • rails env
    • rails response time (cpu time)
    • request vars (session cookies, rack env)
    • keeps track of SQL queries - shows backtrace on queries, explain for queries
    • count of ActiveRecord instantiations on the page using Oink
    • can look in Memcache cache
    • template traces (times for rendering)
    • aggregates all Rails log entries
    • KB delta for process size of Ruby during a single request
  • runs on production environments, password protected
  • instruments using alias_method_chain hacks
  • Working with Yehuda Katz on Orchestra to someday soon simplify it

Pat Nakajima - No more Keynote with Slidedown

GoGaRuCo '09 - Lightning Talks - Pat Nakajima - No more Keynote with Slidedown and Maker's Mark

  • Speaks at NYCrb meetup, and didn't want to use Keynote
  • Wanted to use Markdown but also wanted syntax highlighting
  • Generates an HTML page that you can use to run your presentation
  • The Maker's Mark library was extracted to do easy syntax highlighting in Markdown

Chris Lee - Floxee - OS Twitter Dir

GoGaRuCo '09 - Lightning Talks - Chris Lee - Floxee - OS Twitter Dir

  • open source twitter dirctory application
  • tweet congress
    • directory of tweets from members of congress
  • Floxee on Github

Max - PaMP: Privacy-aware Marketplace

GoGaRuCo '09 - Lightning Talks - Max - PaMP: Privacy-aware Marketplace

  • From IBM Almaden Research Labs
  • privacy-aware market place
  • Goals
    • to develop a platform that allows users to manage their privacy settings across social network -reducing the cognitive burden on a user; leveraging the wisdom of his crowd
    • Maps to opensocial, etc

Andrew Cantino - SelectorGadget

  • No time :(
  • "SelectorGadget is an open source bookmarklet that makes CSS selector generation and discovery on complicated sites a breeze."

Kyle Maxwell - Parsley

  • No time :(
  • "Parsley is a simple to use and elegant language for creating HTML and XML parsers"
  • "Parsley can be used from Ruby, Python, C/C++, and the *nix command-line."

Zach BrockZach Brock
GoGaRuCo '09 - CloudKit: Hacking the Open Stack with Ruby and Rack - Jon Crosby
edit Posted by Zach Brock on Saturday April 18, 2009 at 07:03PM

CloudKit: Hacking the Open Stack with Ruby and Rack - Jon Crosby

Intro

Thanks for the votes, his talk is here because of GoGaRuCo attendee votes.

He works for Engineyard, and they are hiring.

This talk will be "lightning-talk" style, so that means it will be very fast (and also means this live-blog will be pretty sparse)

GoGaRuCo '09 - Jon Crosby

Cloudkit

Cloud Kit is an Open Web JSON Appliance Can quickly and easily spin up an API for RESTful Collections of JSON Documents

Similiar to CouchDB and Perservere Implemented in Ruby (unlike CouchDB)...

Now Frameworks are basically another MVC framework

So why wouldn't you want to do a new MVC architecture?

gem install cloudkit

Radar

"If your RESTFUL API cannot be accessed with curl, you lose"

Resource Composition in the Browser

If you have two widgets in the browser doing different tasks, you can point them at different resources. Example: 280Slides Example: SproutCore

Mobile apps can benefit from this style of restful architecture as well.

ESI caching layers - like Old Skool SSI, except that they are cache includes.

Cloudkit is built on Rack. Rack is awesome.

HTTP Intermediaries - such as Rack Middleware. Rack Is The Web The spec for rack middleware is runnable and readable

Build an App! create config.ru require 'cloudkit' expose :todos, :profiles

Cloudkit bootstraps so you can query it You can ask it for it's Options and it'll tell you what you can do with it

Hypermedia as the Engine of Application State

Cloudkit is read-optimized

No SQL, no ORM, uses Tokyo Cabinet Tables instead

Schema Free, HTTP and JSON are the schema

Can do a PUT to place a new record at a specific location

Can do POST to update. By supplying the version etag the server can solve the "lost update" problem

Auto-versioning, any time you update a resource, the previous version is archived. That's reflected in the url - :collection/:version. This is solves the last-update problem when 2 users update the same document at once. If you try to update a resource without providing the version, it will return 400 bad request. If two clients try to update the same version, the seconds get 412 precondition mismatch response.

Cloudkit also solves the batch GET problem, where you can access the resource with id "_resolved" to get multiple documents at once (and their complete contents).

Finally, with DELETE, you can't delete things that out of date, similar to update. The 410 Gone response will get returned in this case.

"Rewrite in Scala... or solve the problem"

What's missing? The ability to ask questions Pagination Querying - solved with JSONQuery. (/todos[0:10][?priority=3])

jQuery plugin for Cloudkit

All code is up at Jon's Github

Because it's OpenWeb, you can easily add OAuth, OpenID, etc. A desktop application might use OAuth, whereas a web application could use OpenID for authentication.

Q: Isn't querying slow? A: Yeah, it can be slow. There's indexing work that needs to be done on write to optimize read. Tokyo Cabinet might come to the rescue here about searching data with regular expressions.

Q: Are there real world apps using cloudkit? A: Not that I know of. One company might be trying it.

Q: What kind of apps are good for cloudkit? A: I'm personally using it for Actiontastic, a synchronizing web service that provides a REST interface.

Q: Are there plans to abstract away the key/value storage system so other systems can be used? A: Yehuda has a library called Moneta that's an abstraction for Key/value stores that I'd like to move to.

Q: How does CouchDB map/reduce company to cloudkit's JSONQuery? A: It first started as a Sinatra app that sat between couchDB, but I found JSONQuery to be better suited.

Intro

Hypertable and Rails: DB Scaling Solutions with HyperRecord

Links: Hypertable HyperRecord

Rusty is from Zvents, a local search engine

Presentation

Showing example of hourly data for the last month for a single event

GoGaRuCo '09 - Rusty Burchfield

Old benchmark was over 1M rows inserted per second sustained

Hypertable is an open-source implementation of Google's BigTable.

Hypertable is a Column-Oriented DBMS

Data Model 5-part key: Row Key Column Family Column Qualifier Timestamp Revision

One index per table (on the row key) Only stores strings

Architecture Master server - tracks range servers and where data is stored (spare master is also usually run, as it's a single point of failure) Range servers - data is broken up into individual range servers Hyperspace - Handles locking and master recovery HDFS - Stores redundant copies of data

GoGaRuCo '09 - Rusty Burchfield

ThriftBroker - An RPC wrapper for Hypertable for many languages using the Thrift Wrapper

HyperRecord

HyperRecord is a subclass of ActiveRecord for Hypertable Supported by the Hypertable

Example Loading data into simple pages app Loading first 10,000 articles of wikipedia 150MB of data infiled in 14 seconds Loads all the data into a rails scaffold and browses it

Design considerations Denormalization - can't do joins so you have to put your data in an appropriate format for querying. Can use MapReduce to interact with data. Column families/qualifiers - You can store data in the key part of the key value pair Revisions - deletes are represented as inserted delete cells

Questions

Q: How do you break down data by hours in example
A: Broken down by Ruby and aggregated

Q: It looks like the keys in that list were strings, not timestamps, did you have to take the timestamp and convert it to a string yourself?
A: Pretty much

Q: Did the wikipedia articles contain any of the sub-data like images, links, etc?
A: No, just a sql dump as a demo of querying the database through a rails scaffold

Q: Does hypertable select support SQL limits, order, etc?
A: HQL supports a lot of things you'd expect from SQL, but it's still somewhat limited.

Q: What do you do with it?
A: We store all of our log data and process it using Cascading to gather hourly data for all our pages. We then put it in Hypertable so we can query it quickly to generate reports.

Rusty: Cascading is Java code You can easily construct complicated MapReduce jobs using it

Josh: Some other uses of Hypertable at Zvents Changelog We deal with a lot of user created content, and things change often and we don't always know what We log everything that ever happens to our data so that we can track everything that happens to our data. From uploaded images to deleted links to edited descriptions, we can see what changed, when and how.

Zvents and Baidu are the primary sponsors of the Hypertable project. Hypertable and HyperRecord are both on Github.

Hypertable development started 2 years ago as a forward looking solution to analytics problems.

The search problem for Zvents is many dimensional: Time, Location, Description, User Data and User Behavior and Hypertable is a way to inform a lot of that data.

Q: What kind of problems are well suited to HyperTable A: We're trying to move our entire site over. A canonical example for this kind of database is a crawl database. A2: Anything where you have mountains and mountains of data and want to query over it.

Example of Crawl Database stored in Hypertable.

Other articles: