I just had a discussion with a co-Pivot about the resentment that many teams develop about Continuous Integration – especially when the release process requires a green tag from CI, and a broken build is standing in the way.
As anyone who has worked with me will attest, I’m hardcore on CI and consider any team which leaves a build red for longer than a workday to be sorely lacking in discipline.
OK, OK, there are always extenuating circumstances, but I still believe that most resentment of CI stems from underlying antipatterns and smells, rather than problems with CI itself. For example:
- “The Customer Has To See This Feature RIGHT NOW”: Frequent releases are a great thing, but if you cannot wait for a green build to deploy, you have some deeper problems. Often, this is because a team doesn’t manage customer expectations well. The customer should understand that CI is a critical part of the Agile process which ensures that only reliable, quality releases get pushed to staging or production. Any problem which is preventing a green build should be fixed before the release is deployed. If the customer is not willing to allow you that time and flexibility, perhaps they are too addicted to new features, and the entire team needs to have a heart-to-heart about Code Debt in the next retrospective.
- “It Works For Me, But Fails On CI: The important question is which environment is more like production – your development environment or CI? If you are developing on Windows or Mac, and your production box is some other flavor of Xnix, then your CI box should be as close a possible to production. Ideally, you should be able to log on to the CI instance and debug the failing test there. Usually, your CI box is not configured correctly. If it is hard to keep your CI environment in sync with production, then perhaps you should look into automation (because you KNOW you or your sysadmin will probably forget to do the same thing when you push to production, right?). If the problem is that your development environment is not the same as production, and it is a legitimate problem, then CI just saved you some stress on the next deploy.
- “Intermittent” Failures: Same deal as the prior point. CI runs your tests much more than you do. For web apps, it hopefully runs them in more browsers than you do. In my experience, many “intermittent” bugs are real bugs which are just very hard to isolate. It could be an AJAX bug that only happens when the site is run remotely, not via localhost. It could be a performance problem which only shows up on a slower system, not your fastest-on-the-market dev box. It could be a dependency on an external resource that happens to be unavailable sometimes, such as a web service, remote storage, etc. Again, just being aware of these issues puts you ahead of the game. For browser bugs, dig in and find out WHY it is failing intermittently. It may be a real bug. For intermittent outages of external resources, you may just have to live with it, but you don’t have to live with the intermittent failures in CI. Mock out the resource or disable the tests in the CI environment. Yes, this is OK, especially if you leave them enabled in your development environment. Another option is to automatically repeat these tests a few times with a delay, and only fail the entire build if they fail repeatedly. Big services like Amazon or Google might drop a request occasionally, but still respond to a subsequent request.
- Slow Test Suites: This is an insidious problem, because once your suite is slow, it is often a monumental effort to make it fast again. It is much better to be proactive, and monitor any slow-running tests like a hawk, relentlessly mocking out slow resources or replacing broad functional tests with faster, more targeted unit tests. You can also always split your tests into different suites, running your fastest tests continuously, and the entire slow deploy-test suite only nightly or periodically. As long as your customer isn’t addicted to immediate features, it should be fine to only deploy from nightly builds.
- The Failing Test That “Doesn’t Matter”: This is my pet peeve. Whenever I break CI, I fix it ASAP. If I ignore a “minor” broken test, the next thing I check in may be a major FUBAR which gets past my local tests for some reason (see prior points). Some who know me might even say it is LIKELY to be a major FUBAR. The point is, I don’t trust myself or my local box, I trust CI. Now, if ANOTHER developer breaks the build, and tries to tells me they are not going to fix it because it is a “minor” problem, that really chaps my hide. They are ripping huge holes in my nice safety net, forcing me to expend much more time and attention on the tests that I run on my local environment, and causing me more stress and work in general. Stop making excuses, and fix the damn build NOW, or comment out the failing test.
Now, I’m sure that all of the above points can be debated or shown to be inapplicable in a specific situation. Plus, if you are dealing with imperfect CI and development tools (which is always the case), you will have some degree of pain which is directly attributable to CI. It would be great to hear about some of these situations in the comments.
Bottom Line: Integration is always one of the most painful parts of software development. Doing integration with high quality and low risk is even harder. Most developers who have been on a non-Agile project of any significant size have experienced days-long integration hell and ulcer-inducing all-night production deployments. Continuous Integration doesn’t make that pain and stress go away, but it does break it down into small, bite-sized pieces that can be easily handled on a daily basis. All for the low, low cost of being proactive and disciplined, which makes you a better developer anyway.
Ask for Help
It was suggested that perhaps this is a timing issue. Maybe some required JS for the form hasn’t loaded before Selenium is trying the event.
One workaround would be to test only the form submission called by on-click instead of the click itself.
- If you need to add inflections in Rails, make sure you do it before your initializer block in environment.rb. If you put it after the initializer, it will cause your routes to be re-parsed, which will slow down app/test startup significantly. For example:
ActiveSupport::Inflector.inflections do |inflect|
inflect.irregular "criterion", "criteria"
inflect.irregular "Criterion", "Criteria"
Rails::Initializer.run do |config|
There is some strangeness with Rails’ handing of invalid dates outside of the range for a Time object. For example, February 31, 1900. This involves Rails turning it into a Time object, and then back into a Date.
Multiple have found an interesting edge case bug in MySql which results in inconsistent ordering. It has been repeated in versions 5.1a and 5.1b, but we didn’t navigate the MySql bug system yet to find/report it. Here are the query conditions to reproduce it, which could be common in test scenarios for some projects:
* LIMIT <= 5
* ORDER BY id DESC
* > 5 rows in table
* Where clause has index (not compound with id)
We had an excellent tech talk on Vertebra from Ezra Zygmuntowicz and the folks at EngineYard. If you’ve ever been a sysadmin responsible for many boxes, you’ll appreciate the awesome potential of Vertebra…
NetBeans and symlinks: a team was having problems with symlinks in SVN disappearing after a client edited the project in NetBeans. No, he was not using Windows…
Here are some tips for working with acts_as_paranoid (We have helpers for some of these in our common libraries):
Add :scope for validates_uniqueness_of
validates_uniqueness_of :user_id, :scope => :deleted_at
Add :conditions for named_scope with :joins
:select => "users.*",
:joins => :paranoid_relation,
:conditions => 'paranoid_relations.deleted_at IS NULL',
Ask for Help
This question came up at standup. I’ve put some thought into it, so I thought I’d throw it out for discussion and see what other people think.
At Pivotal, we use svn:externals and the Third-Party Branch Pattern (from the SCM Patterns Book) to manage easy, reliable cross-project updates of common plugins which live elsewhere (rubyforge, etc), without having to manually update in every project or be at the mercy of the RubyForge svn repo going down or being slow. When everything was on SVN, this worked well; we had some rake tasks which made it easy to update the branch from the latest trunk version in the external vendor repository. However, with more stuff like Desert is moving to GitHub.
One option is to just check the entire Git repo into subversion. This is labor-intensive, though. It also loses some of the benefits of the Third-Party branch pattern, such being able to easily preserve local patches while still pulling in changes from a new vendor version. We could probably update our Rake tasks to handle Git as well as SVN.
Another option is to switch all our projects to Git, but we aren’t quite ready for that across the board.
Plus, this is a problem even if you move your parent project to Git. For example, there is no easy Git equivalent to having a svn:external which automatically updates to the latest version of an external repository, which is a useful approach for Continuous Integration or automatically pulling the latest Edge Rails into your project on every update.
Here’s some links from people working on related issues, but please comment if you have any ideas or something that works well for you:
Ask for Help
We upgraded to rails 2.1 and polonium and our rspecs
are not running on CI, but run fine if you simply use rake on the
Check to see if you are using the rake extensions in pivotal core bundle.
Using setTimeout() to wait for DOM to update in JsUnit does not work.
Using setTimeout in tests is not going to do what you want, unless you mock setTimeout. Basically, setTimeout kicks off another thread which is not likely to effect the current test.
- There was an edge version of Rails (this project has a frozen Rails at some point in the past) that had a bug where a namespaced route would send a POST that should go to
MyController#create was instead going to a POST to
MyController#index. The fix is to freeze to current Edge, or use
Ask for Help
“Any way to turn off a validation in a re-opened class?”
No way to do this. Best to refactor by pulling the validation out of the class and then mixing it in, or not, at a different level.
- Y!Slow + Firebug + submit a form will cause the result to be pulled from cache instead of hitting your server. The workaround is to disable Y!Slow.
- Some of our customers are requesting targeting Firefox 3, which has some rendering differences from FF2. So we’re adding FF3 to the system image with a new icon and the correct trick to let it run side-by-side with FF2.
- On a related note, rumor is that Facebook is dropping support for IE6. So is 37Signals.
- Interesting issues with the Globalize plugin & Rails 2.1:
- The currency formatting code does not work at all anymore – it always uses a ‘$’
- Their work for localizing templates, which involves injecting a fully-qualified path, breaks
#assert_template. The workaround is to comment out the path injection code, but this only works if you don’t have localized templates.
- EngineYard’s eycap gem version 0.3.6 now has better support for deploying from SVN tags
- Deploying from tag, since the URL was different used to do a
rm -rf, which takes a long time on EY’s GFS disks for large file sizes. This was causing a customer’s deploy taking ~20 minutes
- The fix was to change eycap to use svn switch; the deploy now takes ~1 minute
Ask for Help
“Anyone seen/solved an issue with random font size increase using Firebug 1.1 or later?”
The issue is that at some point a page will render with much larger fonts and the CSS exploration won’t tell you why. The work around is to launch a browser with Firebug disabled and run it side by side with the same page in a window with Firebug enabled (restarting the 2nd window whenever the problem occurs.)
Seeing this with Firefox 2 and 3 and any Firebug later than 1.05 (which doesn’t run on FF3). This might be an issue with IFRAMEs, but we’re not sure. No data found on this via Google searches or the Firebug group. We will post there.
Model#update_all is your friend if you’re not yet on Rails 2.1
Model#update_attribute for each attribute,
#update_all will save direct to the database, bypassing validation, updating only the columns you specify. In Rails 2.1, with partial model updates, you may not need this. But if you’ve not yet upgraded your application, then give
#update_all a try.
Update: fixed per comment.
Model#update_attribute does not validate. Thanks for the catch!