Sam Pierson's blog
Cool Stuff
- Rubymine (Fuzzy search added 4 days ago)
- Rack
- Metal
- CacheMoney - write-thru caching - overcome replication lag.
- Rails Templates - install plugins, do VCS stuff etc.
- Metric Fu - Code analysis: Flay, Flog, Roodi, reek (code smell) & rcov
- Rails.cache
- Cucumber
- FakeWeb - fake entire websites for testing
- Spike - log analyser
- Ultrasphinx - full text search
- Sliding Stats - rack middleware
- Clearance - authentication
- Sprinkle - provisioning
- Passenger Stack
- Spree - shopping cart setup
- Webrat - DSL for integration tests
- Taps - migrate a database from one server to another
HTTP's Best-Kept Secret: Caching Ryan Tomayko (Heroku)
About Ryan
- http://tomayko.com
- Sinatra maintainer.
- Rack core team.
- Creator and maintainer of Rack::Cache.
Http Caching?
- NOT Rails Caching
- HTTP caching headers in requests: Cache-control: If-Modified-Since: If-None-Match:
- and responses: Cache-control: Last-Modified: ETag: Vary:
- This stuff is defined in RFC2616, we won't be going into this that deeply.
Types of Cache
Client cache
- Built into browsers and other types of client.
- 1:1 relationship between cache and client. The cache only serves one client (private cache).
- How much bandwidth does each cache save: can't beat it.
Shared Proxy Cache
- Setup for an organization
- 1:many relationship between cache and clients. Serves more than one client (shared cache).
- Is closer to the client than the server, therefore saves a lot of bandwidth.
Gateway Cache
- a.k.a. Reverse Proxy Cache
- Situated inside of the origin site
- 1:everyone relationship between cache and clients.
- Reduces bandwidth the least.
Why cache?
- The answer to this has changed over time.
- In Nov 1990 there was 1 guy on the web - Tim Berners-Lee.
- In Feb 1996 the web population was 20M. State of the art connectivity was a 28.8kbps modem. At that speed, loading the current http://yahoo.com (~350k) would take 2:48s. Bandwidth was the largest issue. RFC1945 HTTP 1.0 included the Expires: and Last-Modified: headers.
- In March 1999 RFC2616 HTTP 1.1 was released. Addressed 1996 caching problems.
- Today: we cache so we can scale. Keep your back-ends free from as much work as possible. Push as much work up the stack as possible.
HTTP 1.1 defines 2 caching models
Expiration
- Back-end sets Cache-Control: public, max-age: 60
- Gets cached in gateway cache an browser cache.
- Public says it is good for many clients.
- Cached for 60s.
Rails example
def show
expires_in 60.seconds, :public -> true
# stuff
render ...
end
Sinatra example
headers['Cache-Control'] = 'public, max-age=60'
Validation (Conditional GET)
- Back-end adds ETag or Last-modified, e.g. ETag: abcdef012345
- Last-modified is redundant, basically there for HTTP 1.0 clients.
- On 2nd request, gateway cache realizes it has this page in cache, then sends a GET /foo, Host: foo.com, If-None-Match: abcdef012345 to the back-end.
- If back-end returns a 304 Not Modified, gateway cache returns cached version.
Rails example:
def show
@foo = Foo.find(params[:id])
fresh_when :etag => @foo,
:last_modfied => @foo.updated_at.utc
Alternative idiom:
def show
@foo = Foo.find(params[:id])
modified = @foo.updated_at.utc
if stale?(:etac => @foo, :last_modifed => modified)
respond_to ...
Sinatra example:
get '/foo' do
@foo = Foo.find(paramsp:id])
etag @foo.etag
erb :foo
end
Combine Expiration & Validation
- Back-end sets Cache-control: public, max=age=60 and ETag: abcdef012345
- In < 60 seconds, cache-control takes precedence
- After 60 seconds, it queries back-end using ETag
- Back end can then send back a 304 not modified with a new Cache-control: public, max-age: 60
Misc
- Never Generate the Same Response Twice
Recommend using Rack:cache
gem install rack-cache
config.middlware.use Rack::Cache,
:verbose => true,
:metatstore => "fie:/var./cahe/rack/meta",
:entitystore => "file var/cache/rack/body",
:allow_reload => false,
:allow_revalidate => false
The client controls what happens at the cache as well as the server using Cache-control. Refresh send Cache-control: no-cache. No-cache means gateway cache MUST revalidate ETag before sending response. This is bad and people can pound your back-end. :allow_reload => false disables this.
- High-Performance Caches: Squid, Varnish (Heroku uses this)
- Interesting discussion about ESI at the end.
- Rails by default uses id of model, classname and last_updated to create an MD5 hash for etag.
- Need to start with a seed that covers your release version, otherwise etag will not change. Rails now has a mechanism to handle this.
- 2.3 branch has a new "touch" mechanism too.
- Browser behavior differs and varies quite significantly when using SSL.
Slides are online already
Random nuggets from the talk:
The overhead of most requests is calls out of a framework to a DB, FS etc, but because it is called from the framework, that is what gets the blame. This sustains the myth that "<insert your framework of choice> doesn't scale". Solution: put a proxy in front of the server and duplicate the server behind it.
Types of proxy:
- Transparent
- Intercepting
- Caching
- ...
Transparent Cut-Through Proxy = 90% use case
- Transparent Proxy - user cannot detect he is behind a proxy
- Cut-Through - forwards on the fly (not store and forward)
The Problem
Flaws of Staging environments:
- Any change in profile of queries invalidates your testing
- Cost
The Solution
- What if you could take your production traffic and fork it to two environments
EventMachine
- EventMachine inplements a design pattern knows as the reactor pattern
- Will connect to any file descriptor (e.g. a socket)
- Written in C++ for high performance and concurrency without threads
- EM does have a native thread pool used for EM.defer
- http://bit.ly/aiderss-eventmachine excellent PDF to document EM
EM-Proxy
- http://github.com/igrigorik/em-proxy
- A simple DSL for writing proxy servers.
- The return from on_data and on_response blocks is just passed on/back.
- If you return nil from a block, no data gets forwarded.
- 5% performance hit for large messages
- 20% perforamnce hit if messages are very small, mitigate by putting behind HA proxy and add another server.
- No way to send to only 1 back-end server yet (can't implement a load-balancing proxy).
Misc name-dropping
- httpperf is really good for replaying traffic against a site
- igrigorik/autoperf - replay nginx logs against your site
- Recommended we look at MySQL proxy - awesome dashboard.
- Nginx does really good things with compression (gzip, ETAGS etc).
- Mailtrap is a fake SMTP server gem for testing sending email from your Rails app.
- Defensio is a smap filter for blogs. API you can send comments to and it will tell you if it is spam or not. Returns a 'spam index'.
- Beanstalk is an in-memory distributed message queue. Despite frequent requests, they have not implemented persistence, which is what motivated Ilya to work around them with this proxy server.
The Facebook API Menagerie
Facebook API has many different parts:
Canvas Apps
FB requests from your site, they inline the resonse in the middle of the page.
- Not available: JavaScript.
- Available: FBJS - a sandboxed limited JS API, FBML - a templating language, the FB session.
iFrame Apps
Your response is put in an IFRAME on the page.
- Available: FB session.
- Not available: FBJS, FBML.
Connect App
Host your app on your own domain. Connect back into FB to e.g. user their authentication & friends model.
- Available: Connect Session
- Not available: FB Session
Pages/Profiles
- No JS
Think of FB as a browser
FB API LOLZ
- REST API - api/facebook.com/restserver.php - awesome URL.
- Facebook does not do GETs, they always POST to your site.
- Connect inside an iFrame, Connect requires the XHTML doctype to work, but iFrames are not supported in XHTML.
SRSLY
Running your App in development
- Create an app
- Setup DynDNS.com, point it at your IP address http://mysubdomain.dyndns.com
- open ports on your local network
- Add /etc/host entry: 0.0.0.0 facebook.dontexist.com
Abstract the API
- Use Facebooker.
- apps.facebook.com/facebooker_tutorial
- Access everything through the session if session[:facebook_session] @friends = session[:facebook_session].user.friends.collect do |f| User.find_by_fb_user_id f.uid end
Project your code
e.g. use rack-facebook so everything is not a POST
Others
- Frakie: Facebooker for Sinatra
- rack-facebook: tranlates FB POST to correct action
How can I make things better?
- Need a test harness for session[:facebook_session]. There is something in facebooker that needs extraction.
- Greg is starting Faceboot API Continouus Integration Suite of Tests - github.com/atduskgreg/FACIST - some JS hooks to green/red light that your tags are actually showing up in the page.
Rails Metal
- Rails Metal is a gateway to the exciting and possibly dangerous world of Rack.
- Replace selected URLs for a speed boost.
- Bypasses the Rails router. You have to do your own routing.
- All Metal endpoints are tried before Rails routing.
Lets say in an alternate universe you are running EBay on Rails. Your most active page is bots hitting your API to get auction status: GET /auctions/1234567.xml
Replace app/controllers/autctions_controller.rb
With app/metal/auctions_api.rb
class AuctionsApi
def self.call(Env)
url_pattern = %r{/auctions/(\d+).xml}
if m = env['PATH_INFO'].match(url_pattern)
Auctions.find(m[1])
# This is a Rack return value:
[ 200, { "Content-Type" => 'text/xml" }, auction.to_xml ]
end
end
end
Sinatra
Sinatra is an extremely minimalist web framework. It works in the Rack framework. This is an entire Sinatra application:
require 'rubygems'
require 'sinatra'
get '/hello' do
"Hello, whirled"
end
Run it:
$ ruby hello.rb
# Starts up server on port 4567
$ curl http://localhost:4567/hello
Hello, whirled
Here is the alternate universe auction example with Sinatra:
class AuctionsApi < Sinatra::Application
get '/autions/:id.xml'
Auction.find params[:id].to_xml
end
end
Most of this talk was a basic Cucumber primer. However these things were new to me:
Multi-line arguments
You can use this in a spec; it adds a block argument with the string in it:
"""
multi-line
string
"""
Tables
This Genearates a block argument as an array of hashes. ActiveRecord.create can take this as an argument.
Given the following proposals
|email | title |
|aslak.hellesoy@gmail.com | Cucumber|
|bryan@brynary.com | Webrat |
Abstract Scenarios
Scenario Outline: Email accepted prposals
Given the following proposals
|email | title |
|aslak.hellesoy@gmail.com | Cucumber|
|bryan@brynary.com | Webrat |
And the <proposal> proposal is approved
When I send proposal emails
Then <email> should <what>
Examples:
| proposal | email | what |
| Cucumber | aslak.hellesoy@gmail.com | get email |
| Cucumber | bryan@brynary.com | not get email |
| Webrat | bryan@brynary.com | get email |
Before/After/World
Before do
end
After do |scenario|
end
World do
end
World(MyModule)
Background
Feature: Notification emails
Background:
Tagged Features
Feature: Take over the world
I want it all
@spanish @french @english
Scenario: Take over Europe
Then run:
cucumber -t french doit.feature
# or negative
cucumber -t ~french doit.feature
Range selection:
- Full SHA1
- Partial SHA1 - at least 4 characters and unique
- Branch, remote or tag name
- Caret parent: master^^ (2nd parent of master)
- Tilde spec: master~2 (2nd parent of master)
- Combination: master~2^2
- Blob spec: default:path/to/file
- Relative spec: master@{yesterday} (relative to your machine)
- master@{5} the 5th last value of master (locally)
- [old]..[new] everything reacable from new but not from old
- jes/master..master
- jes/master..c36ae
Log usage:
- git log origin/master.. or origin/master..HEAD only the commits that are going to go upstream
- git log ..origin/master or HEAD..origin/master everything that origin/master has that you do not
- git log master --not origin/master
- git log master --not origin/master
- git log --graph gives an ascii graph of listory
Diff:
- git diff HEAD...topic go backto a common ancestor before diffing - gives better results`
- git commit --ammend modify the last commit
Rebasing:
- Replay the changes in my branch on top of another branch.
- rebase --onto use for transplating a topic branch.
- To transplant some of a topic branch, create a new branch to refer to the part you don't want then do a rebase --onto.
- git rebase -i <ref> interactvely pick/redorder/squash by editing a list/script.
- DO NOT rebase using any commits you have already pushed upstream.
Filter Rebranch:
- git filter-branch --tree-filter 'rm -f filename' HEAD Remove all instances of a file from every commit.
Subtree merging:
- Alternative to submodules. Looks way complex. Tim Dysinger wrote rake tasks go do this. Google it.
Patch Staging
- git add -p patch staging - interactively stage only some hunks of a file.
Debugging
- Annotation: git blame
- git blame -C <file> even if your like was moved from another file, produce a blame report for it.
git bisect
git bisect start git bisect bad (Assumes HEAD) git bisect good 3acb4
takes range you just specified, picks the middle commit nad checks it out, you call it good or bad, wash rince repeat.
git bisect reset # when you are done.
Customization
- git config --global help.autocorrect 1 - Stop git com complaining.
- git config --global color.ui auto
- Configure external merge tool.
- .gitattributes for this class of files that match this pattern, treat them differently: e.g. diff binary files echo '*.png diff=exif' >> .gitattributes and add a gitconfig line describing the exif diff strategy.
Ilya's slides are already on the web.
A few random notes:
- In 1994-1995 term frequency was state of the art in search engine relevancy.
- State of the art today = TF-IDF = Term Frequency - Inverse Document Frequency
- http://rubyforge.org/projects/gratr/ graph theory gem - gets slow after 1000 nodes but can manage about a million.
- Working with math in Ruby is not the best idea. Use GSL with one of the ruby binding gems.
Fixtures Suck
When modeling complex business domains, not 3 model blog software, fixtures quickly become a quagmire. What's the size of your domain? Kevin was working on a project with 180 models. This quickly became unworkable even with only 1 fixture file per model. Fixtures don't scale well. Scenarios are also problematic as now you have to maintain a directory hierarchy of fixtures.
Use Data Generation instead
Factory Girl
# Define
Factory.define :user do |f|
f.first_name 'John'
f.last_name 'Doe'
end
# use
user = Factory(:user)
Object Daddy
Reopens your ActiveRecord class and adds generators for each attribute.
# define
class User << ActiveRecord::Base
generator_for :username, :method => :next_user
generator_for :email, :start => 'test@domain.com' do |prev|
user, domain = prev.split('@')
user.succ + '@' + domain
end
end
# use
@user = User.generate!
Others
Machinist
Foundry
Fixjour
Martin Fowler says Mocks Aren't Stubs and talks about Classical and Mockist Teting. Dave shows slightly amusing set of photos about "ists" - Rubyists etc. Ist bin ein red herring. The big issue here is when to use a mock.
Overview of Stubs and Mocks
Terminology: test double - an object standing in for a real object (like a stunt double).
customer = Object.new
logger = Object.new
customer.stub(:name).and_return('Joe Customer')
logger.should_receive(:log)
customer.should_receive(:name).and_return('Joe customer')
# bad - very tightly bound to implementation
customer.stub(:name).and_return('Joe customer')
# also tighly bound to implementation
- Stubs are often used like mocks, mocks used like stubs.
- We verify stubs by checking state after an interaction.
- We tell mocks to verify interactions.
- Sometimes stubs just make the system run.
When are method stubs helpful?
Isolation from non-determinism: Simulate random value geneators or Time.now.
Isolation from external depedencies: e.g. external database or network. Have gave anexample or an ActiveMerchant test that takes 1.5s to run, and stubbed out gateway.stubs(:authorize).returns(AM:Billing:Response.new(true, 'ignore')
Polymorphic collaborators: e.g. employee that knows how to pay itself, uses a strategy. paymet_strategy = mock() employee = E.new(p_s) p_s.expects(:pay) employee.pay
mixins/plugins
When are messsage expectations helpful?
side effects: background processing
caching: only call a network zipcode lookup once
validator = mock()
zipcode = Zipcode.new("01234", validator)
validator.should_receive(:valid?).with("01234").once
zipcode.valid?
zipcode.valid?
interface discovery: tool to discover the parts of the system that you haven't really worked out yet. Mock something out that doesn't exist yet, while designing its interface.
Isolation Testing
All of these concepts are Isolation Testing - testing an object in isolation from others. This is a good fit when you have lots of little objects (ravioli code, as opposed to spaghetti code).
Isolation Testing in Rails
Rails is calzone code. Three layers: View Controller Model. These 3 layers are not the whole picture: browser, router, database. Standard rails testing:
- Unit tests: Testing in isolation. Test model classes (repositories), model objects, database.
- Functional tests: 2 or more non-trivial components work together. Test model classes, model objcets, database, views, controllers.
- Integration tests: Test model classes, model objects, database views controllers, routing/sessions. This is !DRY
Mocking and stubbing you can do in Rails
Partials in view specs:
before :each do
template.stub(:render).with(:partial => anything)
end
...
template.should_receive(:render).with(:partial => 'nav')
Conditional branches in controllers: Stub new and save! methods of models.
Dave has a new project stubble on github: You will need to build RSpec locally to use this for now.
stubbing(Registration) do
# Stubs ActiveRecord finder and save methods on model
Chains are a new RSpec feature: user.stub_chain - some people say this is a test no-no, use with caution.
Guidelines, concerns & Common Pitfalls
- Keep things simple
- Try to avoid tight coupling
- Complex setup is a red flag for design issues
- Don't stub and mock the object that you are testing
- Concern: impedes refactoring (but some say refactoring is improving design without changing behavior, so tests should not change. This really depends what level you are refactoring at).
- Concern: false positives







