Last week, I was tasked with diving into Pivotal’s allocations application to figure out why it was operating so slowly and hopefully make it a bit better. The application was written as a side project about 4 years ago, and clearly showed its age. It’s not every day I run into an application that uses RJS! Anyways, I was able to use an incremental refactoring-based approach to improve the speed by about 80% or so. Edward and Josh Knowles suggested that I write up a bit about what I saw and how I improved it, in hopes that other engineers can make use of these performance tuning and refactoring concepts on other projects.
So, without further ado…
Set up – Measurement measurement measurement
Any time I look into a performance feature, I try to focus on getting a clear piece of functionality to optimize (in this case, the main project matrix page) and measure the heck out of its performance both on production and locally. So, the first place to go was NewRelic. NewRelic told me a couple of things – first off, there were a lot of database queries hogging most of the time. Secondly, most of those database queries involved an ActiveRecord object called ‘PersonRange’.
Adding some manual benchmarking (it’s easy! Just add a ‘stamp’ function and a before/after filter to generate a report of your timestamps) told me that many of the database queries were actually happening during view processing – a big no-no. I had a direction of investigation.
So why did the view take so long?
Like a lot of Rails projects, the view on allocations relies on a series of partials to generate a large matrix. All of these partials are looped – and with the main matrix page looping over projects, each project looping over weeks, each week looping over allocation tiles, you can imagine how the numbers add up quickly.
My first line of improvement was to streamline the innermost partial as much as possible. First off, I replaced the partial with a helper method. In general, looping over a partial is slower than calling the partial as a collection, which is in turn slower than using a helper method to generate markup. The innermost partial was very simple and easily lent itself to being a helper method. For good measure, I turned the ‘loop over allocation tiles’ code snippet into a helper method as well.
When I did this, I naturally started looking at the parameters this method needed. One strangeness – a random lookup hash named ‘roles’ was passed to this method/partial. The partial then looked up an the person’s role from this hash. The lookup hash itself was generated through a DB query generated by a helper method in the next outer partial (project_week_cell), so it generated a DB query per project per week.
On a parallel note, we also needed to look up people’s locations on a week by week basis, and there were some on the fly DB calls happening in the view layer for this as well.
So where did role and location come from? Lo and behold, both of these properties were methods of a single PersonRange object.
My direction was clear. In the controller layer, I made one set of queries to figure out the relevant person range for each person for each week. This cache was used for all location and role questions from that point forward. The cache was plopped into the main ProjectAllocationMatrix object along with the other preexisting caches, of which there were many. Boom, 50% speed improvement.
Before I could tackle a bigger refactor, I needed to simplify and organize the code a bit. The codebase had some obvious things to organize, that wouldn’t affect the performance much but would clarify flows and responsibilities.
- There were some helper methods that didn’t have anything to do with view logic. The easy refactor for these kinds of methods is to figure out which of their arguments is the ‘important object’, and move the method into becoming an instance method of that object.
- I found a few repeated patterns embedded in the views – an expression of ‘billable + unbillable + overhead’ that took three separate collections and added them together in this specific order. These three collections were only used to be added together in this way. An easy refactor consolidated them into one collection method. In the process, I simplified the calculations from three separate selects to a single sort.
- One of the caches stored a person’s last unique set of initials. This cache was then postprocessed to generate an abbreviated name for that person. It seemed more useful to make this cache store the entire abbreviated name, to reduce postprocessing.
- Some of the caches were exposed so that other code (mostly view logic) used them directly as a hash, knowing exactly how the cache was organized. I reduced the cohesion between the caches and the view layer, adding access methods to the matrix object that held the caches.
Taking another passthrough for performance
Even after these changes, I was still seeing some database calls in the view layer. I decided to track them down and get rid of them.
- One of them was a ‘find location by id’ lookup. As it turned out, locations were an association from PersonRange, so making the person range cache join with location in the query was a no-brainer and removed all those queries.
- The time calculating allocations for a week was a bit long. Looking at this code, it was running a SQL query then using Ruby to first group allocations by project ID, then sort by person for each project. I couldn’t imagine that doing an individual sort for each project was terribly efficient. I decided to do the sort first, then select the projects. This shaved a few milliseconds – no big deal.
- Some of the matrix caches weren’t all that useful any more. There was a ‘locations by people’ cache that was used a bit. The person ranges cache could do everything this cache cared about and more – it also knew if a person’s location changed from week to week! I killed the other cache and converted to using the better information.
- The next closest partial was ‘project_week_cell’, which was surrounded by a loop to render individual copies of this partial. I switched it to a collection partial.
- I also found out about a code branch that stored abbreviated names on a person by person basis in the database rather than calculating it for each matrix. I incorporated it and removed another cache calculation from the matrix.
- Around this time, I realized I could tune the ‘resource_target’ cache to cache what we actually cared about, target counts. Shifting the cache shaved a bit of time off of matrix generation in general.
All of these changes chopped another 25% or so off of the load time.
Now the basic matrix page render was in fairly decent shape, and I moved onto the other mandate – making the drag/drop operate more smoothly. The mechanism was basically that it would perform the allocation change, recalculate the matrix for the changed projects, and then render RJS that refreshed the project rows for these changed projects.
When I looked at performance, I discovered that most of the time was spent generating two copies of the allocation matrix. The first matrix was the one mentioned in the controller (just calculating for those two projects); the second matrix was a full matrix calculated to generate refreshed billable percentages.
The interesting thing was that both matrices took almost the same time as each other. Restricting projects did not matter for performance.
As a first pass, I decided that the ‘restricted projects’ matrix was useless. If we used the bigger matrix for everything, we only had to calculate everything once and we would have everything we needed. I made this conversion and I cut server time in roughly half.
We have to refresh large chunks of HTML? Really?
I was now ready to do another performance pass (beyond the minor improvements from having smaller markup).
So why do we need to recalculate the whole matrix, anyways?
I returned to some of the oddities of the first drag/drop pass.
First oddity – making an allocation matrix with only 2 projects of interest was just about as expensive as making an allocation matrix with all projects. I discovered that the reason was buried in the caches – there was a low level shared cache of allocation information used both to calculate project allocations and unallocated people.
Getting rid of this cache and making these two calculations retrieve just the information they needed was the way to go. When I did this, the full matrix stayed at about the same performance level it was before. The smaller matrix, however, became much faster.
This led to the question of whether I could get away with only using the smaller matrix. The answer appeared to be ‘yes’, provided I figured out how to keep the billable percentages over all projects up to date.
So where to, now?
Every refactor leads to a new refactoring idea. Even though my week was finished, I saw plenty of other things that could be tightened up. Among my random thoughts:
- YUI drag/drop requires element IDs to identify elements, and jQuery uses selectors instead. I know which technology I’d rather use. Guess which one allocations currently uses, and how many HTML IDs needed to be added to the markup because of this.
- Some of the HTML classes were extremely duplicated because they were on the wrong elements. For example, I would see a billable project row with a whole bunch of allocation cells, each with the class ‘in_billable_project’. Why not give that class to the project row as a whole?
- I ended up with drag/drop returning chunks of HTML markup. Chunks of basic allocation data would be a lot more lightweight, and managing cells on the client side would open up some nice possibilities for visual effects.