Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

Designing an API in Hell

Andrew Bruce
Sunday, May 19, 2013

Minitest, Ruby’s built-in testing library, has some great out-of-the-box features. One of these is test parallelization. Parallel testing is often added after a suite gets slow enough to hurt. That can be achieved using the parallel_tests gem, which takes advantage of today’s multi-core processors, or using custom solutions for dividing chunks of a suite across several machines. Arguably, test speed should be dealt with by making code design changes, but that’s another story: what interests me most about minitest’s parallelization is the constraints it places upon the design of stateful systems when TDDing from scratch.

You can turn on parallelization for a particular test case:

describe Server do
  parallelize_me!
end

or for all tests:

require 'minitest/hell'

As the name implies, the latter approach turns up the test pain level to 11, but it’s the kind of pain that can have positive effects. For ‘fun’, I started to use minitest’s parallelization on a side project, which has a stateful API backed by a relational database. Here are some of the decisions that were forced out by using parallel test examples.

Commitment to fast tests

I thought that running tests in parallel from the start of a project would make me lazy, causing me to neglect slow tests because they’d be running at the same time as others. Surprisingly, the opposite happened: the need to constantly rerun the whole suite to iron out nondeterministic conflicts encouraged me to fix slow tests early. I ended up being able to run the entire suite several times within a matter of seconds in order to check the tests’ ability to run in parallel.

Side note: this is an early-stage project, with a very low quantity of tests! It will be interesting to see how test speed increases as the volume of tests increases.

Avoiding test duplication

Since the unique constraints of my database could be hit by tests with the same fixture data running at the same time, I was encouraged to use more intention revealing test data for each example, avoiding foo and bar, which commonly litter a suite and make tests harder to read.

For IDs, I used Ruby’s SecureRandom library, which provides GUID and hex generation. I sometimes used hex generation when the user-supplied unique display name of something didn’t matter to the test.

Client-side ID generation

Although not strictly forced out from parallel tests, parallel testing got me thinking about how best to interact with the backend, which has a single database being served by multiple concurrent requests (just like any web server).

Using GUIDs instead of autoincrementing IDs can be a smart decision to make if you can (i.e. you don’t need human-friendly URLs), because it means your database server doesn’t need to worry as much about ensuring uniqueness, since the GUID algorithm effectively guarantees it.

TDDing my API from scratch, without external requirements, encouraged me to use GUIDs to simplify the design and to avoid bottlenecks at the database layer. POST requests canonically return the new URL of the resource you’re creating in the Location header of the response. So to test that a thing really got persisted I’d need to:

  1. POST to /items with a representation of the resource
  2. Grab the Location header of the response to get the new URL
  3. GET /items/:newid and ensure the response body matched the representation I sent

This seemed a very laborious process for storing some data. Much less work is:

  1. PUT to /items/:newid
  2. GET /items/:newid and ensure the response body matched the representation I sent

Since GUIDs can be treated as unique, it didn’t make much sense for the server to generate them.

Positive effect: the app would now cope with a distributed database system on the back-end, despite starting out on a technology that’s thought of as difficult to scale horizontally (SQLite).

Avoiding database resets

It’s common practice to wipe the whole database when starting a new example, or to run each example in a transaction and roll it back when each example finishes. I wanted real black-box tests, so I didn’t want to use transactions. Yet, deleting the whole database at the start of an example didn’t play nice with minitest’s parallelization, since data that one example required would be deleted by another.

The usual approach when using parallel_tests is to create a database for each process. However, since minitest doesn’t manage databases (nor should it) I chose to keep a single database and find a different solution.

I chose to sandbox all of the tests by creating new entities each time, and only checking for output that indicated that particular entity had been worked on. The product I’m working on is a Continuous Integration server, so I’d be creating CI projects (you might also know them as jobs) and expecting them to appear in an XML feed. The tests had to be OK with other data being present, since the other tests could be working too.

This approach precluded tests that checked that the number of records had increased by one, because in an otherwise acceptable “green” test situation they’d occasionally increase by more than one (another test added a record too), stay the same (another test deleted a record) or decrease (more than one had been deleted).

Constraints are fun

While I wouldn’t recommend going rogue like this on a client project, playing with constraints like truly parallel tests can get you thinking about your normal testing procedure. Some of the above decisions allowed for a much faster test execution time, and always having the assumption that other processes could be working on the database forced out some interesting techniques. Some of the techniques I had to avoid due to parallelization would normally necessitate different workarounds with their own drawbacks. For example, if you always assume the count of an ActiveRecord class will go up, you require exclusive use of the database. If instead you scope your queries to a parent entity, this restriction would be removed.

Hell isn’t so bad after all.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Smelling with your ears: TDD techniques to influence your design

Andrew Bruce
Sunday, May 12, 2013

Test Driven Development can be a hard sell. The first pitch is often designed to entice the buyer with safety features, like:

  • “How will you ensure that those bugs don’t creep back in?”
  • “Wouldn’t it be nice to know that one change doesn’t break another?”

In conversations between practiced test drivers, though, design topics tend to pop up:

  • “What is this test telling us about the design of our code?”
  • “Why is this test boring to write?”
  • “Why is this test so slow?”

Then there are really exciting questions, when getting close to a design breakthrough, like:

  • “Is this test telling us we’re lacking polymorphism in our design?”
  • “I’m tired of constructing this thing. How can we group this set of arguments into an object with a name?”

One distinguishing factor between these types of questions is the level of trust in TDD. Someone with little trust might be predisposed to abandon testing before implementation, instead choosing to test afterwards, or not at all. To such a person yet to be sold on the benefits of TDD, the safety questions make more immediate sense, while design questions are often met with blank stares. However, the safety concerns are easily brushed off: it’s a prototype. My team is so smart we don’t need tests. We need to move fast, so we’ll worry about tests later.

Explaining the basic advantages of TDD doesn’t always work as a sales pitch, because those explanations don’t reveal why testing can be difficult, much less why testing sometimes ought to be difficult. Take someone who has never let the design of their code be influenced by tests: they dislike testing for being difficult or boring. Encountering resistance in the TDD process, they choose to forgo the safety advantages of testing, and the design advantages haven’t been made clear.

As you may have gathered, I’m more excited by the design aspect of TDD and related tools than by the safety aspect. I’d like to think that if we sold how TDD can improve the design of code that’s yet to be written, we’d have an easier time tricking our friends into writing code with regression protection.

Learning to listen

There is much talk about “listening to the tests” among TDD practitioners. The listening analogy is apt. Like listening with our ears, the ability to understand what a test tells us about code quality can improve with practice. It’s a subtle concept to grasp, and one I frequently find is not well understood by otherwise experienced developers. This is unfortunate, because it’s a crucial part of getting rapid feedback on the quality of production code. By quality, I’m referring primarily to the ability to cope with changing requirements, as opposed to good coverage of features and edge-cases.

If you can’t hear what your tests are trying to say, there are tools for cranking up the volume. Below are a couple of my favorites. They’re not intended as hard-and-fast rules, but as exercises to try out when you’re frustrated with a test or wondering why it’s getting difficult to test something.

If you haven’t already, you should read about known test smells and their solutions, because we can apparently smell with our TDD ears.

Use your testing framework’s convenience helpers sparingly

In the RSpec world, this often comes down to writing readable examples without using ‘subject’, ‘let’ or ‘before’. It turns out that straightforward assignment is usually OK.

As this Thoughtbot post argues, the let helper effectively introduces Mystery Guests (implicit, hidden fixtures), and overuse results in slow and fragile tests.

I like to avoid lets, subjects and other test helpers for another reason: if I can’t stand to repeat myself in examples, I think about how the code that uses my code will feel. A boring, repetitive test setup might be telling me that my code has too many dependencies. If I’m frantically stuffing things into the database and stubbing out web service requests just to allow myself to construct an object, perhaps the object’s scope is too broad.

If you come across a test that is apparently repetitive, consider tidying the implementation of the system under test before the test itself. You may find that the noise in the test can be dramatically reduced with some production code tweaks.

Avoid stubbing methods to return values

I owe this one to Greg Moeck, who introduced something like it at the San Francisco eXtreme Tuesday Club.

First, a reminder of the definition of stubs versus mocks (to paraphrase Gerard Meszaros):

  1. A stub is a test double that allows you to control the indirect inputs of the system under test.
  2. A mock is a test double that allows you to test the indirect outputs of the system under test.

If you return a value from a stubbed method, you force your production code to depend on a blocking, synchronous call. If you could otherwise send a message and not expect an immediate response, you permit your design to (now or in the future) be asynchronous.

Further to that, if you instead use a mock to expect an output to the collaborator you were previously stubbing, you can more cleanly divide your testing into inputs and outputs of the system under test. It’s the difference between:

it "ensures user is authentic before performing the action" do
  user = stub('user')
  authenticator = stub('authenticator')
  authenticator.stub(:authentic_user?).with(user) { true }
  action = Action.new(user)
  action.perform
  expect(action).to be_complete
end

and:

it "ensures the user is authentic when action is requested" do
  user = stub('user')
  authenticator = mock('authenticator') # assume the player of this role knows who to tell when authentication succeeds or fails
  authenticator.should_receive(:authenticate_user).with(user)
  action = Action.new(user)
  action.perform
end

it "performs an action once a user has been authenticated" do
  action = Action.new(stub('unauthenticated user'))
  authenticated_user = stub('user')
  action.user_successfully_authenticated(user)
  expect(action).to be_complete
end

The code that passes the second set of examples is in better shape for when you need to queue requests to the authenticator and complete the action asynchronously. It uses a “tell, don’t ask” style. The fact that an explicit message is sent to the system under test (‘user_successfully_authenticated’) makes it clear to the reader that the request for authentication and the triggering of the action are separate bits of work. It’s someone else’s business whether I get told about the successful authentication, and how many steps are taken before I’m told.

There are several more techniques I’d like to tell you about, but this post is getting a bit long in the tooth. Maybe next time. Happy listening!

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Procrastination, considered.

Andrew Bruce
Sunday, May 5, 2013

Last week I blogged about a new project for aiding in the hunt for test pollution, Scrubber. This is a personal side project that I began recently. It’s in the very early stages, like many of my other pet projects. It’s something I’ve worked on entirely alone, though I’d also love to collaborate with others and/or pair on it.

I was going to blog this week about the latest improvements I’d made to the code, and how it was going to improve the end-user’s experience despite being a mere refactor and a minor user interface tweak. I got bored after writing the first few paragraphs, and instead got distracted by the code, refactoring and tweaking it more than was necessary. A couple of hours in, I’d totally gold-plated the code in a way that I might have deemed unacceptable whilst on client time. This made me a little angry with myself: why was I procrastinating from blogging, and was I a bad developer who’d lost the ability to work alone?

Something we’re encouraged to do at Pivotal Labs is reflect on our behavior. Thinking about my boredom and procrastination a little, I realized there was something more interesting going on. The process went a bit like this:

  1. Start blogging about the new features. Paste the visual output of the program into the blog post. Realize that the output didn’t meet the requirement of improving the user experience.
  2. Start to implement the missing parts of the feature. Spend hours enjoying the freedom to refactor, delete and re-implement with no time limit, far more than when pairing on client time.

When we pair, we have a safety net to catch us when we get distracted by neat language features or elegant implementation tricks. Our partner will often remind us that we’ve spent, say, two hours making no discernible feature changes and no improvement to maintainability of the code. When soloing, however, especially on pet projects, we are free to choose a goal for the activity. Sometimes we decide to perform code katas, but other times we don’t consciously choose a goal: the goal could become apparent during the activity itself.

So it seemed that I’d accidentally found validation in what I was doing, and a potentially more interesting blog post at the same time. I’d managed to get some programming exercise in instead of banging out a not-quite-earth-shattering new feature. Specifically, I:

  1. Thought like a user until it became obvious that changes were still needed. Switched to developer mode by accident.
  2. Allowed myself to try out several approaches to the Presenter pattern, applying them to a trivial example that would not need a presenter in ‘real world’ programming.
  3. Used the Introduce Null Object refactor in a situation that didn’t call for it. Yet, it felt good and might even prove useful for the project later.
  4. Practiced several coding techniques that might become useful in the work environment.
  5. Temporarily inverted my work-based insecurities about performance and timeliness, providing stress relief as an unexpected benefit of my brain switching itself off from the task at hand.

Reflection is a useful technique for improving well-being. Your mileage may vary. If you’d have preferred to read about the changes made to Scrubber this week, you can read the commit history!

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Fighting test pollution with an RSpec custom ordering strategy

Andrew Bruce
Thursday, April 25, 2013

Test pollution manifests itself as seemingly false negatives or false positives in a test suite. It occurs when some shared state is unintentionally modified, or unintentionally read and used in a test.

When test pollution builds up, it can mean that a project’s build fails unpredictably, which can stop a whole team from shipping code regularly. This is an expensive way to not build software.
Continue reading →

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Rager Party (Standup 7/31/12)

Andrew Bruce
Tuesday, July 31, 2012

Helps

  • New Mail Rack

There’s a new mail rack near the lockers.

  • Capybara inconsistent about what page it’s on

We have a spec that does something along the lines of

visit '/#foo'
visit '/#bar'
save_and_open_page
page.should have_content("bar")

The save_and_open_page gives the bar page, but the assertion fails because it looks for "bar" in the contents of the foo page rather than the bar page.

One suggestion was to tell capybara to resync.

  • Webviews for iOS app – slow?

One team was worried about the speed of Webviews versus the speed of e.g. mobile Safari. It was suggested to try it and see, benchmark even.

Interestings

  • Mad at your Rails logs? Try Lograge

https://github.com/roidrage/lograge uses the amazing notification system in Rails 3 – which itself is worth a read – to make more useful logs for staging & production environments. Looks like it came out of the TravisCI project. And if you’re logging to Splunk, this is more like what you want.

  • Jasmine busted after Firefox upgrade?

Upgrade the selenium-webdriver gem to 2.25.0 and happiness will prevail.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

SF Standup – March 1st, 2012: so who’s deployed Carrierwave?

Andrew Bruce
Thursday, March 1, 2012

Requests for help

“How do I avoid compiling gems on production servers?”

Apparently there’s a project that Chef uses that packages dependencies into an installer.

Also, FPM is potentially a good solution.

Interesting

Survey: who’s actually deployed Carrierwave?

2 people said they had.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

SF Standup – February 28th, 2012: what does i18n stand for?

Andrew Bruce
Tuesday, February 28, 2012

Requests for help

“simple_form i18n labels for namespaced models i.e. Foo::Bar”

The requester wasn’t around, but we assumed the namespacing causes problems. No suggestions except “don’t namespace models”…

“On Ruby 1.8.7, Jasmine is timing out on startup after 60s. The project has a lot of fixtures and tests. Mongrel seems to block itself and then waits. Any ideas?”

A lot of blank faces.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Andrew Bruce

Andrew Bruce
San Francisco

Subscribe to Andrew's Feed

Author Topics

api (1)
bloggerdome (4)
minitest (2)
tdd (3)
testing (3)
rspec (2)
smells (1)
pair programming (1)
pairing (1)
process (1)
productivity (1)
soloing (1)
ci (1)
test pollution (1)
agile (3)
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >