Ken MayerKen Mayer
TDD Action Caching in Rails 3
edit Posted by Ken Mayer on Wednesday March 28, 2012 at 08:10PM

On my current project, we needed to prove that an action cache was working as expected. Alas, the blogosphere had either out-of-date or unhelpful information. So, after many experiments, we came up with an RSpec test that does what we want. It seems ugly to me, and I hope there's a better way. The names have been changed to protect the guilty. Any resemblances to actual classes and methods are purely coincidental.

Ken MayerKen Mayer
SF Standup - Feb 13, 2012 - The Sound of One Hand
edit Posted by Ken Mayer on Monday February 13, 2012 at 09:00AM

Cries for help

Capybara is reporting "stale elements" for no good reason.

Any experience, war stories upgrading from jQuery 1.6 to 1.7, especially with the whole event delegation API change? [crickets] See interestings...

Interestings

Have a really big spec file that takes a long time to run? http://rubygems.org/gems/parallel_split_test, from our own Michael Grosser, will run a big spec file in smaller chunks.

url helpers cache results for better performance, but they seem to be losing custom options (in this case a locales setting).

In jQuery 1.6, triggering an event in an unattached DOM node would propagate the event up into the document, even though it was never attached to it. jQuery 1.7 (maybe 1.7.1) fixes this behavior.

Glenn JahnkeGlenn Jahnke
Performance stories - Writing two stories instead of one
edit Posted by Glenn Jahnke on Monday July 25, 2011 at 09:58PM

TL;DR

Estimating performance stories are total bike-shed conversations.

  1. Write a figure-out-what-to-do story which generates the knowledge to estimate #2 (with repeatable benchmarks)
  2. Now estimate and execute the actual story around doing the optimization (and compare benchmarks)
  3. Repeat until your slowness is gone

Ken MayerKen Mayer
Standup 2010-11-16
edit Posted by Ken Mayer on Tuesday November 16, 2010 at 09:35AM

Interesting Things

  • Project Sprouts from Luke Bayes "an open-source, cross-platform project generation and configuration tool for ActionScript 2, ActionScript 3, Adobe AIR Flash and Flex projects"

"Project Sprouts was originally designed (as AsProject) to solve a specific set of problems and was later entirely refactored to become a modular set of libraries that are built on top of Ruby, RubyGems and Rake." -- Luke Bayes

"So how will Sencha monetize? The company plans to sell its tools, like Sencha Animator, at a premium. It’ll also offer premium support plans." -- Tech Crunch

Helps

"Does anyone have experience with (slower) performance on EC2 compared to Heroku"

Some suggestions:

  • Test network lag
  • The small instance, the default, is just too wimpy to run as an application server
  • Make sure your database instance is big enough, and has enough memory
  • Any experience with RDS?

"Can you run cucumber with its own database instance?"

Responses:

  • By default, Cuke creates its own environment, but piggy-backs on the test database
...
cucumber:
  <<: *TEST
...
  • rake db:test:prepare is wired to :test and won't support a :cucumber in the database.yml without extra rake tasks that have continuing maintenance costs
  • I've used the parallel_tests gem in the past, which has managed to retrofit db:test:prepare to work with multiple test databases, but it did require an extra step after each migration (and it didn't play well with postgres text indexes).

Sam CowardSam Coward
Identify memory abusing partials with GC stats
edit Posted by Sam Coward on Monday September 13, 2010 at 07:01PM

Recently we had to investigate a page that had very poor performance. The project used newrelic, but neither that, the application’s logs nor render traces gave clear results. The database wasn’t the bottleneck, but there were well over 100 partials in the render trace. These partials seemed to perform well, except sometimes the same partial would take hundreds to thousands of milliseconds.

As it turns out, the cause of these random delays was actually garbage collection triggered by abusive memory consumption within many of the partials - millions of objects and heap growth in excess of 100mb. We discovered this by using the memory and GC statistics features of Ruby Enterprise Edition. You can get an idea of how much memory is being used and garbage collected during rendering by wrapping parts of your page in a block passed to this helper:

def gc_profile
  raise "Dude, you left GC profiling on!" unless Rails.env.development?
  allocated_objects_before = ObjectSpace.allocated_objects
  GC.enable_stats
  GC.clear_stats

  yield if block_given?

  growth = GC.growth
  collections = GC.collections
  time = GC.time
  mallocs = GC.num_allocations
  allocated_objects = ObjectSpace.allocated_objects - allocated_objects_before
  GC.disable_stats

  concat content_tag :span, "GC growth: #{growth}b, collections: #{collections}, time #{time / 1000000.0}sec, #{mallocs} mallocs, #{allocated_objects} objects created."
end

Because there were so many partials on the page, moving the helper around to identify the most egregious memory abusers got tiring so I wrote a monkeypatch to the venerable Rack::Bug plug-in which shows you memory consumption and garbage collection statistics for each template shown in the Rack::Bug template trace, as well as on the memory panel.

Example template trace

Memory panel example

First, install Rack::Bug and make sure it's working properly, then add the code below in an initializer. Keep in mind this was written to work with Ruby Enterprise Edition 1.8.7 2010.02. Various 1.8 patches and Ruby 1.9 also provide some GC statistics reporting tools which you could probably customize this patch to use instead.

if Rails.env.development?
  require 'rack/bug'

  Rack::Bug::TemplatesPanel::Trace.class_eval do
    alias_method :old_start, :start
    def start(template_name)
      old_start(template_name)
      @initial_allocated_objects = ObjectSpace.allocated_objects
      @initial_allocated_size = GC.allocated_size
      @initial_num_allocations = GC.num_allocations
      @initial_gc_count = GC.collections
      @initial_gc_time = GC.time
    end

    alias_method :old_finished, :finished
    def finished(template_name)
      @current.allocated_objects = ObjectSpace.allocated_objects - @initial_allocated_objects
      @current.allocated_size = GC.allocated_size - @initial_allocated_size
      @current.num_allocations = GC.num_allocations - @initial_num_allocations
      @current.gc_count = GC.collections - @initial_gc_count
      @current.gc_time = (GC.time - @initial_gc_time)/1000.0
      old_finished(template_name)
    end
  end

  Rack::Bug::TemplatesPanel::Rendering.class_eval do
    attr_accessor :allocated_objects
    attr_accessor :allocated_size
    attr_accessor :num_allocations
    attr_accessor :gc_count
    attr_accessor :gc_time

    def memory_summary
      %{<strong>%.2fms</strong><small>in</small><strong>%d</strong><small>GCs</small>
      <strong>%d</strong><small>new objects</small>
      <strong>%d</strong><small>bytes in</small><strong>%d</strong><small>mallocs</small>} % [gc_time, gc_count, allocated_objects, allocated_size, num_allocations]
    end

    def html
      %{<li>
          <p>#{name} (#{time_summary}) [#{memory_summary}]</p>
          #{children_html}
        </li>}
    end
  end

  Rack::Bug::MemoryPanel.class_eval do
    alias_method :old_before, :before
    def before(env)
      old_before(env)
      GC.enable_stats
      GC.clear_stats
      @initial_allocated_objects = ObjectSpace.allocated_objects
      @initial_allocated_size = GC.allocated_size
      @initial_num_allocations = GC.num_allocations
    end

    alias_method :old_after, :after
    def after(env, status, headers, body)
      old_after(env, status, headers, body)
      @gc_count = GC.collections
      @gc_time = GC.time / 1000.0
      @allocated_objects = ObjectSpace.allocated_objects - @initial_allocated_objects
      @allocated_size = GC.allocated_size - @initial_allocated_size
      @num_allocations = GC.num_allocations - @initial_num_allocations
    end

    def heading
      "#{@memory_increase} KB &#916;, #{@total_memory} KB total, %.2fms in #{@gc_count} GCs" % @gc_time
    end

    def has_content?
      true
    end

    def name
      "memory_panel"
    end

    def content
      %{<style>#memory_panel dd { font-size: large; }</style>
        <h3>Garbage Collection Stats</h3>
        <dl>
          <dt>Garbage collection runs</dt>
          <dd>%d</dd>
          <dt>Time spent in GC</dt>
          <dd>%.2fms</dd>
          <dt>Objects created</dt>
          <dd>%d</dd>
          <dt>Bytes allocated</dt>
          <dd>%d</dd>
          <dt>Allocation calls</dt>
          <dd>%d</dd>
      </dl>} % [@gc_count, @gc_time, @allocated_objects, @allocated_size, @num_allocations]
    end
  end
end

Colin ShieldColin Shield
Marshal.dump vs YAML::dump
edit Posted by Colin Shield on Sunday August 16, 2009 at 03:24PM

We find ourselves with a project with a very large dataset, more than 2 million items. This dataset changes frequently. The changes need to be transported to their respective servers ready to be served out to clients. We decided to use a queuing architecture to distribute data. Objects are serialized and pushed to a queue. The large size of the dataset requires us to optimize as much as possible. There are only so many hours in a day and there is a lot of data to transport. A question was raised in standup as to what was the fastest serialization method: YAML::dump or Marshal.dump. It seemed appropriate to write a quick script and work out which would be appropriate for our particular situation. The objects we are serializing are simple hashes. I thought I'd write something that was representative of our situation in order to present a nice clear decision. Here's some code:

require 'yaml'
obj = {:a => "hello", :b => "goodbye", :c => "new string", :d => {:da => 1, :db => 2}, :e => 1}
start = Time.now
(0..10000).each do
  ser_obj = YAML::dump(obj)
  new_obj = YAML::load(ser_obj)
end
puts "YAML::dump time"
puts Time.now - start
start = Time.now
(0..10000).each do
  ser_obj = Marshal.dump(obj)
  new_obj = Marshal.load(ser_obj)
end
puts "Marshal.dump time"
p Time.now - start
I think we all knew how the results would look. It was nice to see that for our particular case there was a clear winner.
YAML::dump time
5.397909
Marshal.dump time
0.280292

Seems fairly cut and dried to me. I personally prefer YAML for test result comparison. Maybe we'll put something in our spec_helper to use YAML for testing and Marshal for production.

We had an odd bug last week where we ended up with different results after we had eager loaded an association vs loaded directly.

There are apparently two issues with :has_one :through, one of which also applies to :has_many :through.

So given:

class Person
  has_many :friendships
  has_one :best_friend, :through => :friendships, :conditions => "friendships.best = 1"
end

If you do a Person.find(:all, :include => :best_friend), the best_friend that gets preloaded is not necessarily one that has a "friendship.best = 1"

This is due to a bug in the association preloading code that doesn't pass down the finder options, so any :conditions or :order are completely ignored. This problem is easy to fix, just a one line change, but it then exposes another problem.

This problem applies to both :has_many :through and :has_one :through associations. The problem is that the :through association is loaded separately from the :has_one or :has_many association. So it first loads :friendships, and then when it tries to load :best_friend, it doesn't have the table it needs for the :conditions and explodes.

Our current work around is basically putting the conditions on the :through association, although sometimes you need to create a new association just for that which is certainly not idea, especially if you plan on accessing the :through model after it has been loaded.

The way to fix it in Rails is unfortunately a rewrite of how the :through associations are eager loaded.

You can see the lighthouse ticket here

There is also a couple of messages on the Rails Core group

Jim KingdonJim Kingdon
New York Standup 10/1/2008
edit Posted by Jim Kingdon on Wednesday October 01, 2008 at 01:29PM
  • There is a ruby meetup tonight at Outside In, at 7pm.

  • For load testing, Load Testing With Log Replay from igvita.com has a review of several common tools.

Jim KingdonJim Kingdon
New York Standup 9/30/2008
edit Posted by Jim Kingdon on Tuesday September 30, 2008 at 07:02PM
  • What's the best way to import a million records into a postgres database via ActiveRecord (which is needed to implement some application-specific logic)? We anticipate waiting a second (or so) between inserts to avoid slowing down the production database (which is under load, almost entirely reads). If there is any ActiveRecord feature which helps batch together inserts, noone knew about it. As for generally how long this will take (estimates range from 9 to 27 hours), and what the load on the production database will be, we planned on answering that with a trial run of a small number of these records.

  • We're thinking of having capistrano deploy to two demo servers, one particularly aimed at showing to prospective users of our application, and the other mostly for story acceptance. The former would be hosted at a hosting company; the latter an internally run machine. Several people reported they have done this on their projects, and the problems were minor, mostly having to do with whether the deployed location (/u/apps/whatever or some such) is different on the two machines (the solution would be to use the capistrano variables, but tracking down all the places that need to do that could be an issue).

  • Erector tip of the day: in a Rails project, you can put a file (named edit.rb or edit.html.rb) in your view directory, and Rails/Erector will find the template implicitly (as it would for ERB, HAML, etc). It is not necessary to explicitly call render from your controller method.

Jim KingdonJim Kingdon
Standup 08/29/2008
edit Posted by Jim Kingdon on Friday August 29, 2008 at 04:52PM
  • Using multiple buckets for Amazon S3. One of our sites has a lot of images (perhaps 30+ photos per page, different for each page and user) and got significant benefits from using four buckets instead of one. Multiple buckets allows browsers to fetch several images in parallel. Increasing it beyond four probably wouldn't help, as browsers have a limit on how many parallel requests they will send.

  • Amazon S3 now has a copy command. This could be useful, for example, if you have a lot of data in a single bucket and want to move it to multiple buckets. Copy is faster than downloading and re-uploading all that data. The ruby S3 gem, however, only lets you copy in one bucket, so you'll need to bypass the S3 gem.

  • We wrote a script to dump a local SQL database and copy it up to a remote server (for example, a demo or production server). This is in contrast with a script we wrote some months ago which copies from demo to a local workstation (for test data, reproducing data-driven bugs, etc). The push to remote feature was for a situation in which there was a bunch of data to be generated (based on some XML input files) and we could afford to bog down a workstation for half an hour, but not an overloaded (and perhaps underpowered) server.

  • Deprec is a set of capistrano recipes for setting up a remote server (in conjunction with deploying an application), for example creating accounts, ssh keys, init scripts, logrotate, etc.

  • Capistrano 2.3 has weird sudo issues (deleting old releases or something). Recommend Capistrano 2.5.

Other articles: