Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker
Robbie Clutton

Stop leaking ActiveRecord throughout your application

Robbie Clutton
Monday, May 6, 2013

Extending ActiveRecord::Base leaks a powerful API throughout an application which can lead to tempting code which breaks good design. Take the classic blog example where you may want to retrieve the latest posts by a given author. You may have seen, or even written code that gets the dataset you need straight into the controller or view:

Post.where(author_id: author_id).limit(20).order("created_at DESC").each { ... }

For me this is a design violation as well as breaking the “Law of Demeter”[Edit: Current Pivot Adam Berlin and former Pivot John Barker pointed out that chaining with the same object was not a Demeter violation]. The example above tells me structure of the schema that the calling class has no business knowing. It also makes testing using stubs ugly and encourages testing against the database directly. A test would have to chain three methods to stub a return value. It’s brittle, as in it’s susceptible to breaking due to changes outside of the class. For me it also fails from a narrative perspective in that it doesn’t succinctly reveal the intent of this part of the application.

If we were testing this and attempting to use stubs, we’d have to write something like the below. You can see how this is at best cumbersome, but also fragile.

where = stub(:where)
limit = stub(:limit)
order = stub(:order)

Post.stub(:where).with(author_id: author_id) { where }
where.stub(:limit).with(20) { limit }
limit.stub(:order).with("created_at DESC").and_yield(post1, post2, post3)

You may be forgiven for thinking you could chain the stubs like below, but the arguments are ignored and this just serves to highlight the breaking of the ‘Law of Demeter’.

Post.stub_chain(:where, :limit, :order).and_yield(post1, post2, post3)

I’d much rather see that as a message to the Post class.

def self.latest_for_author id
  where(author_id: id).limit(20).order("created_at DESC")
end

Post.latest_for_author(1)

If there were variations of the limit and perhaps offset, they can be passed as option parameters of as an options hash:

def self.latest_for_author id, limit = 20, offset = 0
  where(author: id).limit(limit).offset(offset).order("created_at DESC")
end

Post.latest_for_author(1)
Post.latest_for_author(1, 20, 0)

or

def self.latest_for_author id, options
  limit = options[:limit] || 20
  offset = options[:offset] || 0
  where(author: id).limit(limit).offset(offset).order("created_at DESC")
end

Post.latest_for_author(1, offset: 20)

In order to get the dataset the call looks like the following, and I think is more informative than using the ActiveRecord DSL directly.

Post.latest_for_author(author_id).each { ... }

Testing is also easier, as it puts more emphasis on the messages being sent to objects rather than a chain of calls having to be correct.

Post.should_receive(:latest_for_author).with(1).and_yield(post1, post2, post3)

There are a few advantages to this refactor:

  • Only the Post class knows about the schema
  • Any changes to the implementation of what latest_for_author are encapsulated in one place
  • The method describes the intent more than the implementation
  • Stubbing in the tests are easier as there is one clear dependency
  • Testing the database is encouraged only in the class hitting the database

One further refactor could be done here, and that is to move the query logic out of the Post class once more, but this time into a purpose built query Object:

class LatestPosts
  attr_reader :author_id

  def initialize author_id
    @author_id = author_id
  end

  def find_each(&block)
    Post.where(author_id: author_id).limit(20).order("created_at DESC").find_each(&block)
  end

end

Where using the class looks like:

LatestPosts.new(author_id).find_each { ... }

Here’s what Bryan Helmkamp has to say on query objects in his excellent write up on fat ActiveRecord models. Bryan here rightfully points out that once in a single purpose object, they warrant little attention to unit testing. Now is the right time to use the database to ensure the right data set is being returned and that N+1 queries are not being performed. This means that database testing would only occur within the class actually hitting the database and not the rest of application which has a dependency on the database.

All of these techniques discussed serve to improve the design of an application by preventing leaking responsibilities from one class throughout the rest of the application. I’m also not saying that developers shouldn’t be using ActiveRecord or even Rails, but to use the tools responsibly.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Robbie Clutton

Testing strategies with RSpec, NullDB and Nosql

Robbie Clutton
Sunday, February 3, 2013

Recently I had posted about a few testing strategies that can be applied with RSpec. One of the patterns I mentioned was using something like NullDB to ensure your unit tests were not hitting the database. I had a few conversations about what I’d written, notably from my colleague Ian Lesperance. We discussed, and I conceded, that it’s preferable to have tests related to one class in one spec file. In particular I had split out the tests for the unit level and the integration with the database tests. So, here are my experiments on how I brought those tests back while keeping the same integrity of using a database for some test and forcing the null object pattern on other tests.

I had some issues having those tests in the same file, but with a little help from another colleague, JT Archie, we managed to figure it out.

Consider this rspec test:

  describe Widget do
    describe "#higest_selling", :db do
      it "uses the 'highest_selling' scope" do
        ...
      end
    end

    describe "#display_name" do
      it 'concats the widget name and manufacturer' do
        ...
      end
    end
  end

The ‘higest_selling’ method is a scope and has the ‘:db’ tag associated to the block, while the ‘display_name’ test has no tags applied. I wanted this to be the case, no tags means no database but if you want to hit the database, you need to explicitly call it out.

One trick you might have missed above was no longer needing to do ‘db: true’ in the RSpec tag. With the following setting in the spec helper, you can apply a symbol directly like ‘:db’.

config.treat_symbols_as_metadata_keys_with_true_values = true

Testing with NullDB

To get this working, I had to use the HEAD revision of NullDB:

gem 'activerecord-nulldb-adapter', git: 'git://github.com/nulldb/nulldb.git'

Using NullDB within the same file, we can use the ‘nullify’ and ‘restore’ helpers, but I found it worked best using the ‘around’ configuration. Using ‘before’ and ‘after’ I was having issues with changing the connection adapter during a transaction. This way, it appears to get around that issue.

We run the configuration block around each test that has the ‘type: :model’ tag. RSpec-Rails applies these automatically to any tests in the ‘spec/models’ directory. We look to see if the example has the ‘:db’ tag and if it does, we restore the default connection adapter, and run the example. If the example does not have the ‘:db’ tag applied, we apply the NullDB adapter, run the example and then restore the default adapter.

Within the ‘spec_helper.rb’ file:

  config.around(:each, type: :model) do |example|
    if example.metadata[:db]
      NullDB.restore
      example.run
    else
      NullDB.nullify
      example.run
      NullDB.restore
    end
  end

Testing using stubs

There are other options and with a sizable amount of help from JT, we created a simple way to achieve a similar outcome. Under ActiveRecord there are two methods which actually hit the database, ‘exec’ and ‘exec_query’. These methods can be stubbed out much like any method on any object in an application codebase.

In the ‘spec_helper’ file, we replace the NullDB configuration with the following. We again check for the ‘db’ tag and if it’s not there we stub ‘exec’ and ‘exec_query’.

  config.around(:each, type: :model) do |example|
    unless example.metadata[:db]
      ActiveRecord::Base.connection.stub(:exec).
        and_raise("You're not allowed to do that")
      ActiveRecord::Base.connection.stub(:exec_query).
        and_raise("You're not allowed to do that")
    end
  end

Testing using Nosql

We took this concept one step further and created a Gem that wasn’t RSpec specific. We couldn’t believe our luck when RubyGems showed there was no Gem called ‘nosql’, so with that problem solved we created the Nosql gem. When included in a test suite, any call to the database will raise an exception.

With the around configuration block Nosql is disabled and enabled accordingly.

  config.around(:each, type: :model) do |example|
    if example.metadata[:db]
      Nosql::Connection.disable!
      example.run
    else
      Nosql::Connection.enable!
      example.run
      Nosql::Connection.disable!
    end
  end

All three of these options force unit tests to not hit the database. Database calls will either be ignored (NullDB), or will raise an error (Nosql). This should result in decreased execution time for tests as it will encourage the developer to stub out those interactions with the database.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Tyler Schultz

Standup 09/30/2011: Bulk inserts

Tyler Schultz
Friday, September 30, 2011

Ask for Help

“Our delayed job consumes 2G of memory creating ~20k ActiveRecords in a loop!”

It doesn’t answer why your job is using so much memory, but check out activerecord-import.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Danny Burkes

ActiveRecord callbacks, autosave, before this and that, etc.

Danny Burkes
Tuesday, June 28, 2011

The Ugly Truth

On a recent project, we had an ActiveRecord model that declared some relationships and callbacks like so:

belongs_to :credit_card
before_create :build_credit_card

The intent was that build_credit_card would build the associated CreditCard instance, and ActiveRecord’s default :autosave feature on the belongs_to would save it.

What we discovered was that no CreditCard object was being persisted. We confirmed that :autosave is on by default for belongs_to relationships, so we couldn’t immediately understand why the new CreditCard wasn’t being created.

Googling proved futile, so we dove right in to the ActiveRecord source- and boy did we have a good laugh about 10 minutes later.

What we found was that the :autosave option works by simply declaring a before_save callback- that makes perfect sense.

In our case, however, we were building the object to be autosaved in a before_create callback, which ActiveRecords runs after the before_save callbacks (cf. the callback ordering docs).

So our first problem was that we needed to move the call to build_credit_card from a before_create callback to a before_save :on => :create callback.

Did you catch that? There is a difference between before_create and before_save :on => :create. A big difference.

While I understand the how and why of this, the semantics don’t make it obvious. So beware!

Now with our declarations changed to

belongs_to :credit_card
before_save :build_credit_card, :on => :create

We ran our tests again, and, still, no love. Ahhh, we’ve still got an ordering problem. In addition to the ordering semantics detailed in the docs, ActiveRecord also runs callbacks within a single group in the order in which they are declared. So, even though we changed the call to build_credit_card to occur in a before_save, it was still occurring after the :autosave before_save callback, because of the declaration order.

Finally, we changed our declarations to

before_save :build_credit_card, :on => :create
belongs_to :credit_card

and our tests were happy.

Takeaways

  • When using autosave with any ActiveRecord association, be very careful of callback ordering if you are building or modifying the inverse objects using ActiveRecord callbacks.

  • before_create isn’t ever the same thing as before_save :on => :create, even if it sounds like it should be.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
David Stevenson

Standup 5/28/2010: rails new :inverse_of association option automatically creates backreferences

David Stevenson
Friday, May 28, 2010

Interesting Things

  • has_many and belongs_to associations can now automatically create back references each other, thanks to a Backport of :inverse_of from Rails 3 to rails 2.3.6. This allows us to keep our object graphs more correct and avoid situations where we have 2 copies of the same object because the object graph is walked in reverse. Here’s how to use it:
class Parent < ActiveRecord::Base
  has_one :child, :inverse_of => :parent
  accepts_nested_attributes_for :child
end

class Child < ActiveRecord::Base
  belongs_to :parent
  validates_presence_of :parent
end
  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Basic Ruby Webapp Performance Tuning (Rails or Sinatra)

Alex Chaffee
Wednesday, April 28, 2010

My company launched our app, Cohuman, a few weeks ago. The rush of finishing features, fixing bugs, and responding to user feedback has subsided a bit, and it’s time to go back and give the little baby a tune-up. I find that a good development process will ebb and flow, and as long as you don’t let something slide for too long, it’s perfectly acceptable to let bugs, or performance issues, or development chores pile up for a bit and then attack them concertedly for an entire day or two. A bug-fest or chore-fest or tuning-fest can actually increase efficiency as you get in a rhythm… and it feels really good at the end of the day when you see all the bugs you slayed or all the milliseconds you shaved.

In this article I’d like to describe some of my techniques. I make no claim of originality or great expertise; I just want to share what I know, and hear (in comments) what other people have learned. I’m using Sinatra and ActiveRecord, but not Rails; hopefully this discussion will help people no matter what framework they’re using.

Metrics and Logs

The first step, and often the most overlooked, is to gather metrics. Without knowing how it’s working now, how are you going to know what to improve? And how are you going to know whether you made things better or worse? Frequently I’ll make a change that I’m sure will improve performance, only to discover that it’s made no change, or helped in one place but hurt in another.

Where to begin? We’re using New Relic for live performance monitoring, so my decision of what to optimize was easy: I went to their Web Transactions panel and looked at the Most Time Consuming and Slowest Average Response Time reports. If you don’t have a flashing signpost like that, it’s easy enough to decide on a path to work on: either go with user reports, or click around your app and see what feels slow, or choose the most popular request (which is usually the home page).

I always pick a single path to work on, from request to controller to database to view, and work on the slowest parts. This demands more metrics! It’s a common mistake to jump in and start tuning the database when the view is actually taking twice as long. What’s the use of cutting the database access from 400 to 200 msec when the view is taking 1200 msec to render?

I also like to grab a copy of the production DB and bring it to my development machine so I can be sure I’m profiling real cases, and not being fooled by artifacts of generated data. We’re lucky that our app is currently small enough to do this; when the app gets bigger we’ll have to write a script that grabs only selected users’ data as a slice of the whole enchilada. (Note that there are some privacy concerns here: we are careful to only log in locally using our own accounts, and only to gather statistics in aggregate, not to look at details of user-entered data unless it’s to diagnose a specific user-reported issue or bug.)

Lots of in-app metrics tools exist (e.g. ruby-prof, benchmark), but I prefer the simple approach: I rolled my own Marker class that spits out basic msec timing information to the logs. In single-request performance tuning, what matters is relative timing between sections of code, so any objections to this technique on grounds of accuracy or detail are outweighed by its advantages: it’s simple, it shows where your bottlenecks are, and it divides the logs into sections so you can get a sense of who’s making what calls.

class Marker
  def self.mark(msg, logger = ActiveRecord::Base.logger)
    start = Time.now
    logger.info("#{start} --> starting #{msg} from #{caller[2]}:#{caller[1]}")
    result = yield
    finish = Time.now
    logger.info("#{finish} --< finished #{msg} --- #{"%2.3f sec" % (finish - start)}")
    result
  end
end

Usage is simple: pick a block you’re interested and wrap it in Marker.mark("foo") do...end. You can then scan the logs using “less” (or a text editor) and search for the name you gave the block. Marking your controller and your view is a natural place to start; later you can insert marks inside interesting blocks of domain code. In Sinatra, you can do something like this:

get '/foo/:id' do
  foo = Marker.mark("loading foo") do
    Foo.find(params[:id])
  end
  Marker.mark("rendering foo") do
    FooWidget.new(:foo => foo).to_s # Erector
  end
end

I’ve also got a nice little Rack middleware component that marks the time spent inside each request. Note here that you can put lots of fun information in the name that can be helpful for debugging.

class Marking
  def initialize(app)
    @app = app
  end

  def call(env)
    response = nil
    Marker.mark("#{env['REQUEST_METHOD']} #{env['SCRIPT_NAME']}#{env['PATH_INFO']}") do
      response = @app.call(env)
    end
    response
  end
end

Figuring out where a particular log message (especially a DB query) is coming from is essential. It’s important not to make assumptions. If you think you know where the call is coming from, put in a stack trace to make sure, and rerun the request to confirm. That’s why Marker is outputting caller — caller[0] is the code that names the mark, so you already know where that is; caller[1] is the line that called it, and caller[2] is the line that called caller[1]. If that’s not enough context, drop in a logger.info(caller.join(”nt”)) so you can scan the entire stack trace back up to the application code that you understand.

I’ve found that while ActiveRecord (2.3.5) tries to show where a result is coming from, it doesn’t always get it right, especially if you’re using plugins or gems that insert themselves into the call chain. So I monkey-patched AR to be a little smarter about its tracing:

module ActiveRecord
  module ConnectionAdapters
    class AbstractAdapter
      # strip library file pathnames from logged stack traces
      def log_info(sql, name, ms)
        if @logger && @logger.debug?
          c = caller.detect{|line| line !~ /(activerecord|active_support|__DELEGATION__|vendor|new_?relic)/i}
          c.gsub!("#{File.expand_path(File.dirname(RAILS_ROOT))}/", '') if defined?(RAILS_ROOT)
          name = '%s (%.1fms) %s' % [name || 'SQL', ms, c]
          @logger.debug(format_log_entry(name, sql.squeeze(' ')))
        end
      end
    end
  end
end

All this leads to log entries that look like this:

Thu Apr 15 11:09:17 -0700 2010 --> starting GET /app from /Users/cohumancomputer27inmac/dev/cohuman/lib/query_caching.rb:15:in `call':/Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/connection_adapters/abstract/query_cache.rb:34:in `cache'
  User Load (0.8ms) domain/user.rb:362:in `authenticate_from_login_token'   SELECT * FROM "users" WHERE ("users"."login_token" = E'abc123xyz') LIMIT 1
Thu Apr 15 11:09:17 -0700 2010 --> starting rendering ApplicationPage from /Users/cohumancomputer27inmac/dev/cohuman/controllers/app_controller.rb:4:in `GET /app':/Library/Ruby/Gems/1.8/gems/sinatra-0.9.4/lib/sinatra/base.rb:779:in `call'  Project Load (13.4ms) domain/user.rb:146:in `projects'   SELECT "projects".* FROM "projects" INNER JOIN "memberships" ON "projects".id = "memberships".project_id WHERE (("memberships".user_id = 2))
  User Load (3.2ms) domain/user.rb:135:in `coprojectmates'   SELECT "users".* FROM "users" INNER JOIN "memberships" ON memberships.user_id = users.id WHERE (memberships.project_id in (4,129,122,1,66,82,102,684,533,139,3,155,624,106,90,394,399,153) AND memberships.user_id != 2)   Email Load (2.1ms) domain/user.rb:135:in `coprojectmates'   SELECT "emails".* FROM "emails" WHERE ("emails".user_id IN (1,3,5,7,8,11,12,9,6,14,15,16,17,18,22,27,26,35,45,32,30,79,37,109,80,504,507,508,39,165,521,725,727,729,730,731,734,735,736,105,28,58,240,381,51,40,36,785,834,839,844,847,850,842,840,841,843,889))
  User Load (11.7ms) domain/user.rb:128:in `cohumans'   SELECT "users".* FROM "users" INNER JOIN "cohumanities" ON "users".id = "cohumanities".cohuman_id WHERE (("cohumanities".actor_id = 2))
  Email Load (19.4ms) domain/user.rb:128:in `cohumans'   SELECT "emails".* FROM "emails" WHERE ("emails".user_id IN (1,6,108,8,22,35,509,852,853,854,862,864,866,3,895,896,897,929,930,931,30,944,165,827,976,977,978,735,1024,2003,2004,59))
  SQL (0.5ms) domain/user.rb:177:in `temporary?'   SELECT count(*) AS count_all FROM "emails" WHERE ("emails".user_id = 2)
  Email Load (0.3ms) domain/user.rb:173:in `verified?'   SELECT * FROM "emails" WHERE ("emails".user_id = 2)
Thu Apr 15 11:09:17 -0700 2010 --< finished rendering ApplicationPage --- 0.307 sec
Thu Apr 15 11:09:17 -0700 2010 --< finished GET /app --- 0.311 sec

I know it can look daunting, but when scanning logs, it’s important to keep a clear head. Let’s examine this little burst of gibberish and try to make sense of it.

Line 1 says “–> starting GET /app” which means that the user has made a GET request for our main URL. We can skip ahead (search for “–< GET /app”) and see that the entire request took 0.311 seconds. This isn’t bad, but it could be better.

Line 3 says “–> starting rendering ApplicationPage” which means that all the other queries are happening from inside the rendering view code.

Note that the database queries are only taking 49.3 msec out of 311 msec, which means 84% of the time is spent either processing DB results or rendering them. This request is probably not a good candidate for DB-level tuning.

(How’d I add up all those scary milliseconds without an abacus? Piped my log text into this bad boy:

  ruby -e 'x = 0; STDIN.each do |line| if line =~ /(([0-9.]*)ms)/; then x += $1.to_f; end; end; puts x'

)

Indexes

Most (if not all) databases add an index for the primary key of a table. But a quick scan of the database logs will show many fields that are used in queries, and chances are you haven’t added indexes for them. (In fact, you probably shouldn’t add an index for a field until it shows up in the logs, since indexing slows down writes and takes up extra disk space. Not a lot, but it might add up.) In the above example, look at the User Load — every time a user hits the site we check to see if he’s logged in by querying the database for his login cookie. Adding an index for the “login_token” field in the users table sped up this query by a factor of 10. (Yes, that violates my “don’t fix what ain’t slow” dictum, since going from 10 ms to 1 ms isn’t really fixing much, but I figure it adds up over time since it happens on every single app request.)

Avoidance

The only perfect program is the one with zero lines of code. And the fastest code is that which is not run.

Sometimes you can optimize a section of code by removing unnecessary calls from your app layer. One nice trick these days is to move stuff behind an Ajax call. In Cohuman, we do this with some of our tabs: if you switch to a tab, and it hasn’t been loaded yet, it shows a spinny and starts an Ajax call to load it in. As long as we can keep each Ajax call under a second in length, the user-perceived delay is negligible.

Query Caching

ActiveRecord maintains a query cache, so if you run the same query (and I mean the same SQL), it won’t hit the database again. But if you’re not using Rails, query caching is disabled by default. So I wrote yet another Rack middleware so I don’t have to remember to wrap all my controllers in a ActiveRecord::Base.cache do block:

# a Rack middleware component that enables ActiveRecord query caching
# To use, put "use QueryCaching" in your Sinatra app.

class QueryCaching
  def initialize(app)
    @app = app
  end

  def call(env)
    if is_static_file?(env)
      @app.call(env)
    else
      response = nil
      ActiveRecord::Base.cache do
        response = @app.call(env)
      end
      response
    end
  end

  def is_static_file?(env)
     # if the path end with a dot-extension (e.g. 'foo.jpg') then we assume
     # it's a static file and don't enable the query cache. (This will only
     # work for some application URL schemes, naturally.)
    env['PATH_INFO'] =~ //[^/]*.[^/.]+$/
  end

end

Note that this is a query cache, not an object cache (see below).

Query Tuning

Once you’ve identified some troublesome queries, you need to decide how to optimize them. You’ve basically got two choices here; which to choose should be obvious from the logs. Are there many low-latency queries, or a few high-latency queries? High-latency queries are an obvious target, and you should do your best (with indexes and SQL) to cut them down to size, but don’t let them distract you. There are two hidden costs to low-latency queries:

They actually take longer than they say they do – the AR log line only displays the time for the database connector to return the raw data. It doesn’t show the time to create AR instances, build association “classes” (which takes an annoyingly long time, since all their methods are built on the fly for each instance), and run post-load initialization code. (I just did a little experiment loading ~3000 of our User objects, which have a fair number of associations; SELECT * FROM users took 21 msec but User.all took 547 msec. That’s about 25x as long!)

They stack up, and I’m not talking pancakes – chances are you’ve got a lot of webapp processes hitting a single database (or a small number of slaves). As traffic increases, the queries will stack up like airplanes requesting permission to land. At a certain point you’ll hit a cliff (sorry for the mixed metaphor — it’s not fun to imagine a plane hitting a cliff) and per-request latency will rise dramatically. Lowering the number of queries per web request will, um, raise the ceiling? Lengthen the runway? Lower the cliff? Anyway, it’ll make this problem, uh, less worse. It’s kind of counterintuitive, but the limiting factor for modern webapps is really the number of queries, not the amount of data returned by each query.

ActiveRecord associations (like has_many and belongs_to) are great for getting an app up and running, but as you peruse your logs you’ll notice some things they’re doing that aren’t very efficient. Our app loads a lot of objects, each of which has lots of associated objects, some of which associate to other objects. If we’re displaying a list of Users, and each user has associated Emails (via has_many :emails), and we want to render a list of users and their email addresses, we’ll probably see one query that loads all users, and then one query for each user loading his or her emails.

Adding an :include to the declaration is a good way to reduce these from N+1 to 2, but it doesn’t always work. I have never been able to comprehend AR’s alien logic, so my logs are often littered with queries despite my best efforts fiddling with the association declaration. Furthermore, AR is quite naive about object graphs: for example, user.emails.first.user will make an extra query and return a different user than the one you started with, even though they have the same id and you loaded the emails via :include.

So I’ve gotten good performance boosts by moving away from ActiveRecord and doing some nested queries by hand. Not by writing literal SQL, but by doing one query, extracting the necessary ids, and then doing the next query, and saving or plugging in values directly. This led naturally to Treasury (see below).

Sometimes, of course, writing SQL is unavoidable; fortunately, AR allows it, and there are many people who are much better at that than I, so I won’t embarrass myself by discussing it further here.

Object Caching and the Repository Pattern

During my first pass at tuning Cohuman several months ago, I took a little time to write a library that implements a Repository Pattern. The Treasury is a work in progress that sits in front of ActiveRecord (and eventually, other ORMs) and caches object instances as they pass through. If you then request an object via the Treasury, it will check in its cache and return a pointer to the existing object instead of making a query; if you specify a list of ids, then it will only query for the ones it doesn’t yet have. (There are other features I won’t go into here, including a DSL for building queries… expect an upcoming article to officially introduce Treasury to the world.)

I’ve heard that DataMapper has an object cache, but I haven’t yet dug into the details of how it works, so I don’t know if Treasury is redundant with it, or if it would make sense to plug in DM behind it. (I’ve also heard it solves the N+1 query problem gracefully. Anyone want to proselytize DM in the comments?)

All the caches I’ve mentioned only persist within a single request. This is probably a good thing, since allowing instances to persist between requests would open a can of data integrity, thread safety, and multi-host worms. But I can’t shake this vision I have of a sort of in-process memcache for Ruby objects, where multiple processes communicate changes to each other via TCP wormholes… Does anyone else share this vision, or am I doomed to wander the Ruby blog desert, mumbling incoherently at strangers?

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
David Stevenson

Standup 4/28/2010: Webrat threading errors & new RubyMine version

David Stevenson
Wednesday, April 28, 2010

Ask for Help

“We keep getting webrat thread exceptions running our integration specs with the rails integration runner: Thread tried to join itself. The error message varies with different versions of ruby 1.8.6 vs 1.8.7.”

Anyone had this problem or know why?

“How do I skin an iphone mobile site to be the correct width so it’s not 980px wide?”

<meta name="viewport" content="width = device width" />

*”We’re trying to deploy some nginx configuration changes to EngineYard Cloud, what’s the right way to do that?”

We’ve tried building custom chef recipes to solve this problem, but they run after nginx has already restarted, so are a poor solution to this problem. The better solution might be to check in configuration files into the application and symlink them into the nginx configuration directory using a before_symlink.rb hook in the /deploy directory.

*”We’ve got a has_many association where some of the child records are originally saved in an invalid state. When we later load the parent and ask it if it’s valid, it returns true even with validates_associated. How can we get the desired validation behavior?”

Turns out that unloaded associations are not validated. Solution: load the association before calling .valid? on the parent. In general, you should also not create invalid objects, instead using a state variable to put them into a “draft” or “incomplete” state where they are still valid but not complete. Then remove that state and you’ll see the errors required to finish that object.

Interesting Things

  • When RubyMine 2.0.1 won’t run your focused specs, try attaching rspec 1.2.9 to it rather than 1.3.x. It fixed this issue for one of our teams.
  • Rubymine 2.0.2 came out today: can finally run focused contexts?! Also including bundler support! What’s new
  • We tried our Unicorn on EngineYard cloud: so far so good. It’s still “experimental” but seems to work.
  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Pivotal Labs

Standup 4/1/2010: “update_attribute is almost never the right thing to use”

Pivotal Labs
Thursday, April 1, 2010

Ask for Help

“mp4s do not play during download in chrome”

Talks at Pivotal are recorded and published in various formats. Our talks page has an embedded viewer so that people can watch the video without downloading. We also offer a downloadable version in mp4. Most browsers will play the video as it is downloading. Google Chrome does not. Are we doing something wrong?

Pivotal Talks

Interesting Things

  • Beware: has_many associated objects are saved before has_one associated objects.

  • update_attribute of foreign_key value on belongs_to association does not save…when object was created by factory girl?

This is pretty specific, but perhaps not enough to be useful. The team tried to do an update_attribute on an object generated by factory girl, changing the belongs_to column value. However, no database update would occur. They later resolved this by doing a reload on the object before the update_attribute.

“Update_attribute is almost never the right thing to do” –anon

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

UTC vs Ruby, ActiveRecord, Sinatra, Heroku and Postgres

Alex Chaffee
Friday, January 22, 2010

Now that I’m starting to use DelayedJob to perform jobs in the future in my Heroku Sinatra app, its important that they happen at the scheduled time. But unless you pay attention, you’ll find that times get mysteriously changed — in my case, since I’m in San Francisco in the wintertime, by +/-8 hours — which means that some conversion to or from UTC is being attempted, but it’s only working halfway.

Trying to keep a handle on which libraries are attempting, and which are failing, to convert times is a losing battle, so I’m trying to do the right thing and save all my times in the database in UTC, and convert them to and from the user’s local time as close to the UI as possible. Unfortunately, a variety of gotchas in Ruby and ActiveRecord and PostgreSQL makes this trickier than it should be. Here’s a little catalog of my workarounds.


You must set both Time.zone = "UTC" and ActiveRecord::Base.default_timezone = :utc. Since I’m using Sinatra, not Rails, this stuff goes either in main (i.e. not inside any class) right after require 'active_record', or in a configure block in your app, depending on your preference.


When ActiveRecord creates queries — which are used for both reading and writing, mind you — it will only convert to UTC times that are instances of ActiveSupport’s proprietary TimeWithZone class. It will not convert regular Ruby Time objects, even though Time objects are perfectly aware of their time zones, and AR is perfectly aware that you’d prefer they be written as UTC (due to the default_timezone setting). This is clearly a bug IMHO, but the Rails core marked the bug as “will not fix”, so w/e. Here’s a monkey patch, courtesy of Peter Marklund:

  module ActiveRecord
    module ConnectionAdapters # :nodoc:
      module Quoting
        # Convert dates and times to UTC so that the following two will be equivalent:
        # Event.all(:conditions => ["start_time > ?", Time.zone.now])
        # Event.all(:conditions => ["start_time > ?", Time.now])
        def quoted_date(value)
          value.respond_to?(:utc) ? value.utc.to_s(:db) : value.to_s(:db)
        end
      end
    end
  end

When outputting timestamps to a UI — either inside HTML or in a JSON API — you’ll probably want to use Time#strftime. Beware: on Mac OS X under Ruby 1.8, the %z (lowercase Z) selector will emit the local time zone, not the zone of the Time object you’ve called strftime on. The solution is to either use %Z (capital Z) or just a plain Z which stands for Zulu Time. The latter is OK if you know you’re using UTC, which, if you’ve followed my advice, you probably do. This is a pretty annoying issue, since it’s much safer to use %z‘s hour offsets than %Z‘s three-letter codes, since the three-letter codes can be ambiguous, and in any case require an extra conversion to time offset, so you may as well just emit the offset.

Here are some methods on Time you may want to use that work around this %z issue:

  # Note: do NOT call this file 'time.rb' :-D

  require 'time'

  class Time
    def full_date_and_time
      strftime('%Y-%m-%d %H:%M:%S %Z')
    end

    def iso8601
      strftime('%Y-%m-%dT%H:%M:%SZ') # the final "Z" means "Zulu time" which is ok since we're now doing all times in UTC
    end
  end

That iso8601 method comes in really handy when you’re using the excellent timeago jQuery plugin by Ryan McGeary (@rmm5t).


By default PostgreSQL saves timestamps sans time zone, which means that ActiveRecord interprets them as being in the default_timezone. If you want to be extra clear and save them with time zone, you’ll have to change the Postgres adapter’s type mapping. ActiveRecord doesn’t let you configure this but here’s a monkey patch, courtesy of
Chirag Patel (with a couple of mods):

    require 'active_record/connection_adapters/postgresql_adapter'
    class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter < ActiveRecord::ConnectionAdapters::AbstractAdapter
      def native_database_types
        {
          :primary_key => "serial primary key".freeze,
          :string      => { :name => "character varying", :limit => 255 },
          :text        => { :name => "text" },
          :integer     => { :name => "integer" },
          :float       => { :name => "float" },
          :decimal     => { :name => "decimal" },
          :datetime    => { :name => "timestamp with time zone" },
          :timestamp   => { :name => "timestamp with time zone" },
          :time        => { :name => "time" },
          :date        => { :name => "date" },
          :binary      => { :name => "bytea" },
          :boolean     => { :name => "boolean" }
        }
      end
    end

It turned out that I didn’t need this, so I ended up commenting it out. It may be that storing timestamps with time zones will cause a hiccup with some other random DB code, so watch out. If you do use it, and you’ve already got some data, make sure to write a migration that changes the types of all extant datetime and timestamp fields, and maybe a migration that shifts the times too.


That’s all I’ve got for right now. I’m sure some more problems will come up on March 14, 2010…

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Mike Grafton

Standup 10/6/2009

Mike Grafton
Tuesday, October 6, 2009

Help

Why is upgrading to Ruby 1.8.7 so painful?

More specifically, a Pivot was wondering why there seem to be so many ways to install Ruby and Rubygems on a Mac. There are a lot of different places where gems end up being installed depending on which version of Ruby you have installed, and the specifics of how you installed it. The conversation turned into one about RVM and Yehuda Katz’ Bundler, two technologies that appear destined to make it much easier to easily combine a version of Ruby with a set of gems under a particular project.


What is that technology that allows for more complex condition hashes in ActiveRecord?

This must be ActiveRecord::Extensions, which allows for an expanded syntax in the conditions hash of AR finders. A debate was had as to whether hashes and arrays could possibly comprise a reasonable DSL for complex query logic, but surprisingly, the final word on the subject was not reached during standup.


We are using curl to talk to a Mongrel/Rack server that is running some specs. That server is emitting dots (just as any Rspec process would), but we cannot get those dots to show up in real-time on the client. The only way we’ve been able to force a flush is with a newline character, but that gives us an ugly vertical column of dots. Any suggested hacks for this?


The Bay Area Chef Meetup Group is meeting on 10/14 in Mountain View. If you’re into Chef (and here at Pivotal we use it extensively), you might want to check it out.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Topics

  • agile (781)
  • rails (113)
  • testing (88)
  • ruby (83)
  • ruby on rails (70)
  • jobs (62)
  • javascript (55)
  • techtalk (44)
  • rspec (38)
  • ironblogger (32)
  • productivity (30)
  • activerecord (29)
  • gogaruco (29)
  • git (28)
  • nyc (27)
  • rubymine (26)
  • bloggerdome (23)
  • mobile (22)
  • process (21)
  • pivotal tracker (21)
  • cucumber (20)
  • design (19)
  • jasmine (19)
  • ios (18)
  • webos (17)
  • objective-c (17)
  • android (16)
  • tracker ecosystem (16)
  • palm (16)
  • "soft" ware (16)
  • fun (15)
  • ci (15)
  • cedar (15)
  • rails3 (14)
  • performance (14)
  • bdd (14)
  • gem (13)
  • css (13)
  • tdd (13)
  • selenium (12)
  • goruco (12)
  • bundler (12)
  • meetup (11)
  • railsconf (11)
  • nyc-standup (11)
  • capybara (10)
  • mac (10)
  • mojo (10)
  • chef (10)
  • api (10)
Subscribe to activerecord Feed
  1. 1
  2. 2
  3. 3
  4. →
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >