Sam Pierson's blog



Slides

Cool Stuff

  • Rubymine (Fuzzy search added 4 days ago)
  • Rack
  • Metal
  • CacheMoney - write-thru caching - overcome replication lag.
  • Rails Templates - install plugins, do VCS stuff etc.
  • Metric Fu - Code analysis: Flay, Flog, Roodi, reek (code smell) & rcov
  • Rails.cache
  • Cucumber
  • FakeWeb - fake entire websites for testing
  • Spike - log analyser
  • Ultrasphinx - full text search
  • Sliding Stats - rack middleware
  • Clearance - authentication
  • Sprinkle - provisioning
  • Passenger Stack
  • Spree - shopping cart setup
  • Webrat - DSL for integration tests
  • Taps - migrate a database from one server to another

Sam PiersonSam Pierson
Railsconf: HTTP's Best-Kept Secret: Caching - Ryan Tomayko (Heroku)
edit Posted by Sam Pierson on Thursday May 07, 2009 at 05:18PM

HTTP's Best-Kept Secret: Caching Ryan Tomayko (Heroku)

About Ryan

  • http://tomayko.com
  • Sinatra maintainer.
  • Rack core team.
  • Creator and maintainer of Rack::Cache.

Http Caching?

  • NOT Rails Caching
  • HTTP caching headers in requests: Cache-control: If-Modified-Since: If-None-Match:
  • and responses: Cache-control: Last-Modified: ETag: Vary:
  • This stuff is defined in RFC2616, we won't be going into this that deeply.

Types of Cache

Client cache

  • Built into browsers and other types of client.
  • 1:1 relationship between cache and client. The cache only serves one client (private cache).
  • How much bandwidth does each cache save: can't beat it.

Shared Proxy Cache

  • Setup for an organization
  • 1:many relationship between cache and clients. Serves more than one client (shared cache).
  • Is closer to the client than the server, therefore saves a lot of bandwidth.

Gateway Cache

  • a.k.a. Reverse Proxy Cache
  • Situated inside of the origin site
  • 1:everyone relationship between cache and clients.
  • Reduces bandwidth the least.

Why cache?

  • The answer to this has changed over time.
  • In Nov 1990 there was 1 guy on the web - Tim Berners-Lee.
  • In Feb 1996 the web population was 20M. State of the art connectivity was a 28.8kbps modem. At that speed, loading the current http://yahoo.com (~350k) would take 2:48s. Bandwidth was the largest issue. RFC1945 HTTP 1.0 included the Expires: and Last-Modified: headers.
  • In March 1999 RFC2616 HTTP 1.1 was released. Addressed 1996 caching problems.
  • Today: we cache so we can scale. Keep your back-ends free from as much work as possible. Push as much work up the stack as possible.

HTTP 1.1 defines 2 caching models

Expiration

  • Back-end sets Cache-Control: public, max-age: 60
  • Gets cached in gateway cache an browser cache.
  • Public says it is good for many clients.
  • Cached for 60s.

Rails example

def show
  expires_in 60.seconds, :public -> true
  # stuff
  render ...
end

Sinatra example

headers['Cache-Control'] = 'public, max-age=60'

Validation (Conditional GET)

  • Back-end adds ETag or Last-modified, e.g. ETag: abcdef012345
  • Last-modified is redundant, basically there for HTTP 1.0 clients.
  • On 2nd request, gateway cache realizes it has this page in cache, then sends a GET /foo, Host: foo.com, If-None-Match: abcdef012345 to the back-end.
  • If back-end returns a 304 Not Modified, gateway cache returns cached version.

Rails example:

def show
  @foo = Foo.find(params[:id])
  fresh_when :etag => @foo,
  :last_modfied => @foo.updated_at.utc

Alternative idiom:

def show
  @foo = Foo.find(params[:id])
  modified = @foo.updated_at.utc
  if stale?(:etac => @foo, :last_modifed => modified)
    respond_to ...

Sinatra example:

get '/foo' do
  @foo = Foo.find(paramsp:id])
  etag @foo.etag
  erb :foo
end

Combine Expiration & Validation

  • Back-end sets Cache-control: public, max=age=60 and ETag: abcdef012345
  • In < 60 seconds, cache-control takes precedence
  • After 60 seconds, it queries back-end using ETag
  • Back end can then send back a 304 not modified with a new Cache-control: public, max-age: 60

Misc

  • Never Generate the Same Response Twice

Recommend using Rack:cache

gem install rack-cache

config.middlware.use Rack::Cache,
  :verbose          => true,
  :metatstore       => "fie:/var./cahe/rack/meta",
  :entitystore      => "file var/cache/rack/body",
  :allow_reload     => false,
  :allow_revalidate => false

The client controls what happens at the cache as well as the server using Cache-control. Refresh send Cache-control: no-cache. No-cache means gateway cache MUST revalidate ETag before sending response. This is bad and people can pound your back-end. :allow_reload => false disables this.

  • High-Performance Caches: Squid, Varnish (Heroku uses this)
  • Interesting discussion about ESI at the end.
  • Rails by default uses id of model, classname and last_updated to create an MD5 hash for etag.
  • Need to start with a seed that covers your release version, otherwise etag will not change. Rails now has a mechanism to handle this.
  • 2.3 branch has a new "touch" mechanism too.
  • Browser behavior differs and varies quite significantly when using SSL.

Slides are online already

Random nuggets from the talk:

The overhead of most requests is calls out of a framework to a DB, FS etc, but because it is called from the framework, that is what gets the blame. This sustains the myth that "<insert your framework of choice> doesn't scale". Solution: put a proxy in front of the server and duplicate the server behind it.

Types of proxy:

  • Transparent
  • Intercepting
  • Caching
  • ...

Transparent Cut-Through Proxy = 90% use case

  • Transparent Proxy - user cannot detect he is behind a proxy
  • Cut-Through - forwards on the fly (not store and forward)

The Problem

Flaws of Staging environments:

  • Any change in profile of queries invalidates your testing
  • Cost

The Solution

  • What if you could take your production traffic and fork it to two environments

EventMachine

  • EventMachine inplements a design pattern knows as the reactor pattern
  • Will connect to any file descriptor (e.g. a socket)
  • Written in C++ for high performance and concurrency without threads
  • EM does have a native thread pool used for EM.defer
  • http://bit.ly/aiderss-eventmachine excellent PDF to document EM

EM-Proxy

  • http://github.com/igrigorik/em-proxy
  • A simple DSL for writing proxy servers.
  • The return from on_data and on_response blocks is just passed on/back.
  • If you return nil from a block, no data gets forwarded.
  • 5% performance hit for large messages
  • 20% perforamnce hit if messages are very small, mitigate by putting behind HA proxy and add another server.
  • No way to send to only 1 back-end server yet (can't implement a load-balancing proxy).

Misc name-dropping

  • httpperf is really good for replaying traffic against a site
  • igrigorik/autoperf - replay nginx logs against your site
  • Recommended we look at MySQL proxy - awesome dashboard.
  • Nginx does really good things with compression (gzip, ETAGS etc).
  • Mailtrap is a fake SMTP server gem for testing sending email from your Rails app.
  • Defensio is a smap filter for blogs. API you can send comments to and it will tell you if it is spam or not. Returns a 'spam index'.
  • Beanstalk is an in-memory distributed message queue. Despite frequent requests, they have not implemented persistence, which is what motivated Ilya to work around them with this proxy server.

The Facebook API Menagerie

Facebook API has many different parts:

Canvas Apps

FB requests from your site, they inline the resonse in the middle of the page.

  • Not available: JavaScript.
  • Available: FBJS - a sandboxed limited JS API, FBML - a templating language, the FB session.

iFrame Apps

Your response is put in an IFRAME on the page.

  • Available: FB session.
  • Not available: FBJS, FBML.

Connect App

Host your app on your own domain. Connect back into FB to e.g. user their authentication & friends model.

  • Available: Connect Session
  • Not available: FB Session

Pages/Profiles

  • No JS

Think of FB as a browser

FB API LOLZ

  • REST API - api/facebook.com/restserver.php - awesome URL.
  • Facebook does not do GETs, they always POST to your site.
  • Connect inside an iFrame, Connect requires the XHTML doctype to work, but iFrames are not supported in XHTML.

SRSLY

Running your App in development

  1. Create an app
  2. Setup DynDNS.com, point it at your IP address http://mysubdomain.dyndns.com
  3. open ports on your local network
  4. Add /etc/host entry: 0.0.0.0 facebook.dontexist.com

Abstract the API

  • Use Facebooker.
  • apps.facebook.com/facebooker_tutorial
  • Access everything through the session if session[:facebook_session] @friends = session[:facebook_session].user.friends.collect do |f| User.find_by_fb_user_id f.uid end

Project your code

e.g. use rack-facebook so everything is not a POST

Others

  • Frakie: Facebooker for Sinatra
  • rack-facebook: tranlates FB POST to correct action

How can I make things better?

  • Need a test harness for session[:facebook_session]. There is something in facebooker that needs extraction.
  • Greg is starting Faceboot API Continouus Integration Suite of Tests - github.com/atduskgreg/FACIST - some JS hooks to green/red light that your tags are actually showing up in the page.

Sam PiersonSam Pierson
Railsconf: Rails Metal, Rack and Sinatra - Adam Wiggings (Heroku)
edit Posted by Sam Pierson on Wednesday May 06, 2009 at 06:18PM

Rails Metal

  • Rails Metal is a gateway to the exciting and possibly dangerous world of Rack.
  • Replace selected URLs for a speed boost.
  • Bypasses the Rails router. You have to do your own routing.
  • All Metal endpoints are tried before Rails routing.

Lets say in an alternate universe you are running EBay on Rails. Your most active page is bots hitting your API to get auction status: GET /auctions/1234567.xml

Replace app/controllers/autctions_controller.rb

With app/metal/auctions_api.rb

class AuctionsApi
  def self.call(Env)
    url_pattern = %r{/auctions/(\d+).xml}
    if m = env['PATH_INFO'].match(url_pattern)
      Auctions.find(m[1])
      # This is a Rack return value:
      [ 200, { "Content-Type" => 'text/xml" }, auction.to_xml ]
    end
  end
end

Sinatra

Sinatra is an extremely minimalist web framework. It works in the Rack framework. This is an entire Sinatra application:

require 'rubygems'
require 'sinatra'

get '/hello' do
  "Hello, whirled"
end

Run it:

$ ruby hello.rb
# Starts up server on port 4567
$ curl http://localhost:4567/hello
Hello, whirled

Here is the alternate universe auction example with Sinatra:

class AuctionsApi < Sinatra::Application
  get '/autions/:id.xml'
    Auction.find params[:id].to_xml
  end
end

Sam PiersonSam Pierson
Railsconf: Cucumber - Aslak Hellesoy
edit Posted by Sam Pierson on Wednesday May 06, 2009 at 04:18AM

Most of this talk was a basic Cucumber primer. However these things were new to me:

Multi-line arguments

You can use this in a spec; it adds a block argument with the string in it:

"""
  multi-line
  string
"""

Tables

This Genearates a block argument as an array of hashes. ActiveRecord.create can take this as an argument.

Given the following proposals
  |email                    | title   |
  |aslak.hellesoy@gmail.com | Cucumber|
  |bryan@brynary.com        | Webrat  |

Abstract Scenarios

Scenario Outline: Email accepted prposals
  Given the following proposals
    |email                    | title   |
    |aslak.hellesoy@gmail.com | Cucumber|
    |bryan@brynary.com        | Webrat  |
  And the <proposal> proposal is approved
  When I send proposal emails
  Then <email> should <what>

  Examples:
    | proposal | email                    | what          |
    | Cucumber | aslak.hellesoy@gmail.com | get email     |
    | Cucumber | bryan@brynary.com        | not get email |
    | Webrat   | bryan@brynary.com        | get email     |

Before/After/World

Before do
end

After do |scenario|
end

World do
end

World(MyModule)

Background

Feature: Notification emails
  Background:

Tagged Features

Feature: Take over the world
  I want it all

  @spanish @french @english
  Scenario: Take over Europe

Then run:

cucumber -t french doit.feature

# or negative

cucumber -t ~french doit.feature

Sam PiersonSam Pierson
Railsconf: Smacking Git Around - Advanced Git Tricks Scott Chacon (GitHub)
edit Posted by Sam Pierson on Tuesday May 05, 2009 at 10:19PM

Presentation

Cheat Sheet

Range selection:

  • Full SHA1
  • Partial SHA1 - at least 4 characters and unique
  • Branch, remote or tag name
  • Caret parent: master^^ (2nd parent of master)
  • Tilde spec: master~2 (2nd parent of master)
  • Combination: master~2^2
  • Blob spec: default:path/to/file
  • Relative spec: master@{yesterday} (relative to your machine)
  • master@{5} the 5th last value of master (locally)
  • [old]..[new] everything reacable from new but not from old
  • jes/master..master
  • jes/master..c36ae

Log usage:

  • git log origin/master.. or origin/master..HEAD only the commits that are going to go upstream
  • git log ..origin/master or HEAD..origin/master everything that origin/master has that you do not
  • git log master --not origin/master
  • git log master --not origin/master
  • git log --graph gives an ascii graph of listory

Diff:

  • git diff HEAD...topic go backto a common ancestor before diffing - gives better results`
  • git commit --ammend modify the last commit

Rebasing:

  • Replay the changes in my branch on top of another branch.
  • rebase --onto use for transplating a topic branch.
  • To transplant some of a topic branch, create a new branch to refer to the part you don't want then do a rebase --onto.
  • git rebase -i <ref> interactvely pick/redorder/squash by editing a list/script.
  • DO NOT rebase using any commits you have already pushed upstream.

Filter Rebranch:

  • git filter-branch --tree-filter 'rm -f filename' HEAD Remove all instances of a file from every commit.

Subtree merging:

  • Alternative to submodules. Looks way complex. Tim Dysinger wrote rake tasks go do this. Google it.

Patch Staging

  • git add -p patch staging - interactively stage only some hunks of a file.

Debugging

  • Annotation: git blame
  • git blame -C <file> even if your like was moved from another file, produce a blame report for it.
  • git bisect

    git bisect start git bisect bad (Assumes HEAD) git bisect good 3acb4

    takes range you just specified, picks the middle commit nad checks it out, you call it good or bad, wash rince repeat.

    git bisect reset # when you are done.

Customization

  • git config --global help.autocorrect 1 - Stop git com complaining.
  • git config --global color.ui auto
  • Configure external merge tool.
  • .gitattributes for this class of files that match this pattern, treat them differently: e.g. diff binary files echo '*.png diff=exif' >> .gitattributes and add a gitconfig line describing the exif diff strategy.

Sam PiersonSam Pierson
Railsconf: Building a Mini-Google in Ruby - Ilya Grigorik
edit Posted by Sam Pierson on Tuesday May 05, 2009 at 09:23PM

Ilya's slides are already on the web.

A few random notes:

  • In 1994-1995 term frequency was state of the art in search engine relevancy.
  • State of the art today = TF-IDF = Term Frequency - Inverse Document Frequency
  • http://rubyforge.org/projects/gratr/ graph theory gem - gets slow after 1000 nodes but can manage about a million.
  • Working with math in Ruby is not the best idea. Use GSL with one of the ruby binding gems.

Sam PiersonSam Pierson
Railsconf: In Praise of Non-fixtured Data - Kevin R. Barnes
edit Posted by Sam Pierson on Tuesday May 05, 2009 at 07:22PM

Fixtures Suck

When modeling complex business domains, not 3 model blog software, fixtures quickly become a quagmire. What's the size of your domain? Kevin was working on a project with 180 models. This quickly became unworkable even with only 1 fixture file per model. Fixtures don't scale well. Scenarios are also problematic as now you have to maintain a directory hierarchy of fixtures.

Use Data Generation instead

Factory Girl

# Define

Factory.define :user do |f|
  f.first_name 'John'
  f.last_name  'Doe'
end

# use

user = Factory(:user)

Object Daddy

Reopens your ActiveRecord class and adds generators for each attribute.

# define

class User << ActiveRecord::Base

  generator_for :username, :method => :next_user

  generator_for :email, :start => 'test@domain.com' do |prev|
    user, domain = prev.split('@')
    user.succ + '@' + domain
  end
end

# use

@user = User.generate!

Others

Machinist

Foundry

Fixjour

Sam PiersonSam Pierson
Railsconf: Don't mock yourself out - Dave Chelimsky
edit Posted by Sam Pierson on Tuesday May 05, 2009 at 06:48PM

Martin Fowler says Mocks Aren't Stubs and talks about Classical and Mockist Teting. Dave shows slightly amusing set of photos about "ists" - Rubyists etc. Ist bin ein red herring. The big issue here is when to use a mock.

Overview of Stubs and Mocks

Terminology: test double - an object standing in for a real object (like a stunt double).

customer = Object.new
logger = Object.new
customer.stub(:name).and_return('Joe Customer')
logger.should_receive(:log)

customer.should_receive(:name).and_return('Joe customer')
# bad - very tightly bound to implementation

customer.stub(:name).and_return('Joe customer')
# also tighly bound to implementation
  • Stubs are often used like mocks, mocks used like stubs.
  • We verify stubs by checking state after an interaction.
  • We tell mocks to verify interactions.
  • Sometimes stubs just make the system run.

When are method stubs helpful?

Isolation from non-determinism: Simulate random value geneators or Time.now.

Isolation from external depedencies: e.g. external database or network. Have gave anexample or an ActiveMerchant test that takes 1.5s to run, and stubbed out gateway.stubs(:authorize).returns(AM:Billing:Response.new(true, 'ignore')

Polymorphic collaborators: e.g. employee that knows how to pay itself, uses a strategy. paymet_strategy = mock() employee = E.new(p_s) p_s.expects(:pay) employee.pay

mixins/plugins

When are messsage expectations helpful?

side effects: background processing

caching: only call a network zipcode lookup once

validator = mock()
zipcode = Zipcode.new("01234", validator)
validator.should_receive(:valid?).with("01234").once
zipcode.valid?
zipcode.valid?

interface discovery: tool to discover the parts of the system that you haven't really worked out yet. Mock something out that doesn't exist yet, while designing its interface.

Isolation Testing

All of these concepts are Isolation Testing - testing an object in isolation from others. This is a good fit when you have lots of little objects (ravioli code, as opposed to spaghetti code).

Isolation Testing in Rails

Rails is calzone code. Three layers: View Controller Model. These 3 layers are not the whole picture: browser, router, database. Standard rails testing:

  • Unit tests: Testing in isolation. Test model classes (repositories), model objects, database.
  • Functional tests: 2 or more non-trivial components work together. Test model classes, model objcets, database, views, controllers.
  • Integration tests: Test model classes, model objects, database views controllers, routing/sessions. This is !DRY

Mocking and stubbing you can do in Rails

Partials in view specs:

before :each do
  template.stub(:render).with(:partial => anything)
end

...

template.should_receive(:render).with(:partial => 'nav')

Conditional branches in controllers: Stub new and save! methods of models.

Dave has a new project stubble on github: You will need to build RSpec locally to use this for now.

stubbing(Registration) do
#  Stubs ActiveRecord finder and save methods on model

Chains are a new RSpec feature: user.stub_chain - some people say this is a test no-no, use with caution.

Guidelines, concerns & Common Pitfalls

  • Keep things simple
  • Try to avoid tight coupling
  • Complex setup is a red flag for design issues
  • Don't stub and mock the object that you are testing
  • Concern: impedes refactoring (but some say refactoring is improving design without changing behavior, so tests should not change. This really depends what level you are refactoring at).
  • Concern: false positives

Other articles: