On my current project, we needed to prove that an action cache was working as expected. Alas, the blogosphere had either out-of-date or unhelpful information. So, after many experiments, we came up with an RSpec test that does what we want. It seems ugly to me, and I hope there's a better way. The names have been changed to protect the guilty. Any resemblances to actual classes and methods are purely coincidental.
Cries for help
Capybara is reporting "stale elements" for no good reason.
Any experience, war stories upgrading from jQuery 1.6 to 1.7, especially with the whole event delegation API change? [crickets] See interestings...
Interestings
Have a really big spec file that takes a long time to run? http://rubygems.org/gems/parallel_split_test, from our own Michael Grosser, will run a big spec file in smaller chunks.
url helpers cache results for better performance, but they seem to be losing custom options (in this case a locales setting).
In jQuery 1.6, triggering an event in an unattached DOM node would propagate the event up into the document, even though it was never attached to it. jQuery 1.7 (maybe 1.7.1) fixes this behavior.
TL;DR
Estimating performance stories are total bike-shed conversations.
- Write a figure-out-what-to-do story which generates the knowledge to estimate #2 (with repeatable benchmarks)
- Now estimate and execute the actual story around doing the optimization (and compare benchmarks)
- Repeat until your slowness is gone
Interesting Things
- Project Sprouts from Luke Bayes "an open-source, cross-platform project generation and configuration tool for ActionScript 2, ActionScript 3, Adobe AIR Flash and Flex projects"
"Project Sprouts was originally designed (as AsProject) to solve a specific set of problems and was later entirely refactored to become a modular set of libraries that are built on top of Ruby, RubyGems and Rake." -- Luke Bayes
- Sencha Touch 1.0 Ships - Now Free!
"So how will Sencha monetize? The company plans to sell its tools, like Sencha Animator, at a premium. It’ll also offer premium support plans." -- Tech Crunch
Helps
"Does anyone have experience with (slower) performance on EC2 compared to Heroku"
Some suggestions:
- Test network lag
- The small instance, the default, is just too wimpy to run as an application server
- Make sure your database instance is big enough, and has enough memory
- Any experience with RDS?
"Can you run cucumber with its own database instance?"
Responses:
- By default, Cuke creates its own environment, but piggy-backs on the test database
... cucumber: <<: *TEST ...
rake db:test:prepareis wired to :test and won't support a :cucumber in the database.yml without extra rake tasks that have continuing maintenance costs- I've used the parallel_tests gem in the past, which has managed to retrofit db:test:prepare to work with multiple test databases, but it did require an extra step after each migration (and it didn't play well with postgres text indexes).
Recently we had to investigate a page that had very poor performance. The project used newrelic, but neither that, the application’s logs nor render traces gave clear results. The database wasn’t the bottleneck, but there were well over 100 partials in the render trace. These partials seemed to perform well, except sometimes the same partial would take hundreds to thousands of milliseconds.
As it turns out, the cause of these random delays was actually garbage collection triggered by abusive memory consumption within many of the partials - millions of objects and heap growth in excess of 100mb. We discovered this by using the memory and GC statistics features of Ruby Enterprise Edition. You can get an idea of how much memory is being used and garbage collected during rendering by wrapping parts of your page in a block passed to this helper:
def gc_profile
raise "Dude, you left GC profiling on!" unless Rails.env.development?
allocated_objects_before = ObjectSpace.allocated_objects
GC.enable_stats
GC.clear_stats
yield if block_given?
growth = GC.growth
collections = GC.collections
time = GC.time
mallocs = GC.num_allocations
allocated_objects = ObjectSpace.allocated_objects - allocated_objects_before
GC.disable_stats
concat content_tag :span, "GC growth: #{growth}b, collections: #{collections}, time #{time / 1000000.0}sec, #{mallocs} mallocs, #{allocated_objects} objects created."
end
Because there were so many partials on the page, moving the helper around to identify the most egregious memory abusers got tiring so I wrote a monkeypatch to the venerable Rack::Bug plug-in which shows you memory consumption and garbage collection statistics for each template shown in the Rack::Bug template trace, as well as on the memory panel.


First, install Rack::Bug and make sure it's working properly, then add the code below in an initializer. Keep in mind this was written to work with Ruby Enterprise Edition 1.8.7 2010.02. Various 1.8 patches and Ruby 1.9 also provide some GC statistics reporting tools which you could probably customize this patch to use instead.
if Rails.env.development?
require 'rack/bug'
Rack::Bug::TemplatesPanel::Trace.class_eval do
alias_method :old_start, :start
def start(template_name)
old_start(template_name)
@initial_allocated_objects = ObjectSpace.allocated_objects
@initial_allocated_size = GC.allocated_size
@initial_num_allocations = GC.num_allocations
@initial_gc_count = GC.collections
@initial_gc_time = GC.time
end
alias_method :old_finished, :finished
def finished(template_name)
@current.allocated_objects = ObjectSpace.allocated_objects - @initial_allocated_objects
@current.allocated_size = GC.allocated_size - @initial_allocated_size
@current.num_allocations = GC.num_allocations - @initial_num_allocations
@current.gc_count = GC.collections - @initial_gc_count
@current.gc_time = (GC.time - @initial_gc_time)/1000.0
old_finished(template_name)
end
end
Rack::Bug::TemplatesPanel::Rendering.class_eval do
attr_accessor :allocated_objects
attr_accessor :allocated_size
attr_accessor :num_allocations
attr_accessor :gc_count
attr_accessor :gc_time
def memory_summary
%{<strong>%.2fms</strong><small>in</small><strong>%d</strong><small>GCs</small>
<strong>%d</strong><small>new objects</small>
<strong>%d</strong><small>bytes in</small><strong>%d</strong><small>mallocs</small>} % [gc_time, gc_count, allocated_objects, allocated_size, num_allocations]
end
def html
%{<li>
<p>#{name} (#{time_summary}) [#{memory_summary}]</p>
#{children_html}
</li>}
end
end
Rack::Bug::MemoryPanel.class_eval do
alias_method :old_before, :before
def before(env)
old_before(env)
GC.enable_stats
GC.clear_stats
@initial_allocated_objects = ObjectSpace.allocated_objects
@initial_allocated_size = GC.allocated_size
@initial_num_allocations = GC.num_allocations
end
alias_method :old_after, :after
def after(env, status, headers, body)
old_after(env, status, headers, body)
@gc_count = GC.collections
@gc_time = GC.time / 1000.0
@allocated_objects = ObjectSpace.allocated_objects - @initial_allocated_objects
@allocated_size = GC.allocated_size - @initial_allocated_size
@num_allocations = GC.num_allocations - @initial_num_allocations
end
def heading
"#{@memory_increase} KB Δ, #{@total_memory} KB total, %.2fms in #{@gc_count} GCs" % @gc_time
end
def has_content?
true
end
def name
"memory_panel"
end
def content
%{<style>#memory_panel dd { font-size: large; }</style>
<h3>Garbage Collection Stats</h3>
<dl>
<dt>Garbage collection runs</dt>
<dd>%d</dd>
<dt>Time spent in GC</dt>
<dd>%.2fms</dd>
<dt>Objects created</dt>
<dd>%d</dd>
<dt>Bytes allocated</dt>
<dd>%d</dd>
<dt>Allocation calls</dt>
<dd>%d</dd>
</dl>} % [@gc_count, @gc_time, @allocated_objects, @allocated_size, @num_allocations]
end
end
end
We find ourselves with a project with a very large dataset, more than 2 million items. This dataset changes frequently. The changes need to be transported to their respective servers ready to be served out to clients. We decided to use a queuing architecture to distribute data. Objects are serialized and pushed to a queue. The large size of the dataset requires us to optimize as much as possible. There are only so many hours in a day and there is a lot of data to transport. A question was raised in standup as to what was the fastest serialization method: YAML::dump or Marshal.dump. It seemed appropriate to write a quick script and work out which would be appropriate for our particular situation. The objects we are serializing are simple hashes. I thought I'd write something that was representative of our situation in order to present a nice clear decision. Here's some code:
require 'yaml'
obj = {:a => "hello", :b => "goodbye", :c => "new string", :d => {:da => 1, :db => 2}, :e => 1}
start = Time.now
(0..10000).each do
ser_obj = YAML::dump(obj)
new_obj = YAML::load(ser_obj)
end
puts "YAML::dump time"
puts Time.now - start
start = Time.now
(0..10000).each do
ser_obj = Marshal.dump(obj)
new_obj = Marshal.load(ser_obj)
end
puts "Marshal.dump time"
p Time.now - start
I think we all knew how the results would look. It was nice to see that for our particular case there was a clear winner.
YAML::dump time 5.397909 Marshal.dump time 0.280292
Seems fairly cut and dried to me. I personally prefer YAML for test result comparison. Maybe we'll put something in our spec_helper to use YAML for testing and Marshal for production.
We had an odd bug last week where we ended up with different results after we had eager loaded an association vs loaded directly.
There are apparently two issues with :has_one :through, one of which also applies to :has_many :through.
So given:
class Person
has_many :friendships
has_one :best_friend, :through => :friendships, :conditions => "friendships.best = 1"
end
If you do a Person.find(:all, :include => :best_friend), the best_friend that gets preloaded is not necessarily one that has a "friendship.best = 1"
This is due to a bug in the association preloading code that doesn't pass down the finder options, so any :conditions or :order are completely ignored. This problem is easy to fix, just a one line change, but it then exposes another problem.
This problem applies to both :has_many :through and :has_one :through associations. The problem is that the :through association is loaded separately from the :has_one or :has_many association. So it first loads :friendships, and then when it tries to load :best_friend, it doesn't have the table it needs for the :conditions and explodes.
Our current work around is basically putting the conditions on the :through association, although sometimes you need to create a new association just for that which is certainly not idea, especially if you plan on accessing the :through model after it has been loaded.
The way to fix it in Rails is unfortunately a rewrite of how the :through associations are eager loaded.
There is a ruby meetup tonight at Outside In, at 7pm.
For load testing, Load Testing With Log Replay from igvita.com has a review of several common tools.
What's the best way to import a million records into a postgres database via ActiveRecord (which is needed to implement some application-specific logic)? We anticipate waiting a second (or so) between inserts to avoid slowing down the production database (which is under load, almost entirely reads). If there is any ActiveRecord feature which helps batch together inserts, noone knew about it. As for generally how long this will take (estimates range from 9 to 27 hours), and what the load on the production database will be, we planned on answering that with a trial run of a small number of these records.
We're thinking of having capistrano deploy to two demo servers, one particularly aimed at showing to prospective users of our application, and the other mostly for story acceptance. The former would be hosted at a hosting company; the latter an internally run machine. Several people reported they have done this on their projects, and the problems were minor, mostly having to do with whether the deployed location (/u/apps/whatever or some such) is different on the two machines (the solution would be to use the capistrano variables, but tracking down all the places that need to do that could be an issue).
Erector tip of the day: in a Rails project, you can put a file (named edit.rb or edit.html.rb) in your view directory, and Rails/Erector will find the template implicitly (as it would for ERB, HAML, etc). It is not necessary to explicitly call render from your controller method.
Using multiple buckets for Amazon S3. One of our sites has a lot of images (perhaps 30+ photos per page, different for each page and user) and got significant benefits from using four buckets instead of one. Multiple buckets allows browsers to fetch several images in parallel. Increasing it beyond four probably wouldn't help, as browsers have a limit on how many parallel requests they will send.
Amazon S3 now has a copy command. This could be useful, for example, if you have a lot of data in a single bucket and want to move it to multiple buckets. Copy is faster than downloading and re-uploading all that data. The ruby S3 gem, however, only lets you copy in one bucket, so you'll need to bypass the S3 gem.
We wrote a script to dump a local SQL database and copy it up to a remote server (for example, a demo or production server). This is in contrast with a script we wrote some months ago which copies from demo to a local workstation (for test data, reproducing data-driven bugs, etc). The push to remote feature was for a situation in which there was a bunch of data to be generated (based on some XML input files) and we could afford to bog down a workstation for half an hour, but not an overloaded (and perhaps underpowered) server.
Deprec is a set of capistrano recipes for setting up a remote server (in conjunction with deploying an application), for example creating accounts, ssh keys, init scripts, logrotate, etc.
Capistrano 2.3 has weird sudo issues (deleting old releases or something). Recommend Capistrano 2.5.
