10,000 Tests and Counting

I played a “Yeah” sound effect in campfire a few weeks ago in celebration of checking in our 10,000th test. It was a milestone worth celebrating with both Creme Brûlée Bread Pudding and a chocolate chip cookie. Stepping back a few years I had to fight policy battles just to allot any development time to testing or even check it into CVS with the production code.

Some good things about 10,000 tests and counting:

  • We have pretty good confidence that we can catch breaking changes throughout the app. CI and a suite of much slower QA Acceptance tests add to that confidence.
  • We can run the entire suite of 10,000 RSpec examples in about 8 minutes on the newest Macbook Pros with 16GB of RAM and 4 cores plus hyper threading.
  • Finding old crufty areas of the codebase that aren’t’ tested is a rare surprise rather than a common experience.
  • Even our large “god” classes are generally well tested.
  • We’re constantly thinking about ways to increase the speed of the overall run to at least keep it under the 10 minute threshold rule of thumb. This tends to lead to good refactoring efforts to decouple slow tests from their slow dependencies.

Some not so good things:

  • Many of the ‘unit’ tests are really light integration tests since they depend on database objects, Rails ActiveRecord objects in our case.
  • Some of our ‘god’ classes have 3000+ lines of tests and take 2-3 minutes to run on their own.
  • We have to rely on methods like parallel tests to distribute our unit test running.
  • If it doesn’t look like a change will impact anything outside the new code we sometimes skip running a full spec and let the CI server catch issues.
  • Running individual specs that use ActiveRecord often take 5-8 seconds to spin up, which is painfully long for a fast TDD cycle.
  • Our full acceptance test suite still isn’t consistent enough and running on CI so we have even more of a dependency of trusting the indirect integration testing in our unit test suite.
  • We’d like to use things like guard or auto test, but we haven’t been able to make them work with such a large number of tests.

Even with all the cons of a really large test suite, I love that we have it and run it all day long.

Spock Intro Tutorial

I gave a presentation on Spock a very nice BDD framework in Groovy a few months back to our Groovy Users Group in Sacramento. After using it on a real world Grails project the last few months it has grown on me to become my go to testing framework for Groovy/Grails or Java projects. A typical specification looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  def "a pager should calculate total pages, current page, and offset"() {
    when: "count, rows and page number"
    def pager = new Pager(count, rows, page)

    then: "should return correct total pages, the current page, and the offset"
    pager.totalPages == totalPages
    pager.currentPage == currentPage
    pager.offset == offset

    where: "you have a number of different scenarios"
    count | rows | page | totalPages | currentPage | offset
    100   | 10   |   1  |    10      |     1       |   0
    950   | 100  |   5  |    10      |     5       |   400
    72    | 20   |   3  |    4       |     3       |   40
  }

If that passed your 5 second test take a look at a fuller introductory tutorial I put together.

A Gentle Introduction to Spock

And if you want to try out executing real code the project has a nice browser based environment at Meet Spock.

Faulty Hopes for UI Testing Tools

Michael Feathers wrote a tough post recently on UI testing tools.

The fact of the matter is that UI based testing should be used for UIs: that’s it. You should not be testing your full application end-to-end through a UI Testing tool. First of all, that sort of testing couples some of the most important tests in your system to one of the most volatile parts of it.

Michael Feathers

I understand the frustration he speaks from, but I’ve always realized that you don’t really want to completely try to test through the front-end of the application. It can be useful with legacy systems to get a minimal test harness in place. With the right testers I’ve even seen some of the Mercury products used effectively for UI testing, though it probably required a 3-1 ratio of QA to developers and the tests were not customer tests, but written more as a regression suite.

My use of functional testing tools like Selenium has generally been for a spattering of end to end tests and some amount of customer tests. I’ve never really tried to achieve a high level of coverage with these tests. Obviously testing the business logic through something like Fitnesse is more effective or even setting up BDD style tests tend to make better customer tests.

I’ve actually been saddened over the past five years or so to see that tools like Fitnesse haven’t seen more adoption. I think the BDD development has put more of a focus on developing customer style tests, but in general if there are automated end-to-end test suites in place there often overly reliant on something like Selenium.

Developer Expectations

I came across a note of mine from last year on my baseline expectations for developers:

  • All code is checked into source control on an hourly basis or at most daily.
  • Every project has an automated build. (Maven, Ant)
  • All projects are setup in continuous integration (Hudson)
  • All code follows the current Java/Groovy coding standards.
  • Unit test coverage of new code must meet a 70% target. TDD is preferred.
  • Code reviews or regular pair programming are required.
  • Code should meet a standard of low cyclomatic complexity through refactoring and design.
  • Some level of functional, integration, and acceptance tests should be performed.
  • High value documentation is maintained.

Business Users As Developers

Getting DSLs to be business readable is far less effort than business writable, but yields most of the benefits.

Martin Fowler

Software transparency to business experts is a great goal. I’ve met plenty of sophisticated business users who at least could do some basic SQL and whip together lots of nice reports in Excel. Those same users when presented with Java code lean back in their chairs and wait for the developer to show them a screenshot. Getting a DSL they can actually read and give feedback on means higher quality software.

I can’t claim to have reached this goal despite some attempts. So far the closest was a Fitnesse implementation validating business rules in some vendor software. The testers really took on creating scenarios after it clicked for them, but the business analysts still found it a bit to rough around the edges. Baby steps.

Fowler nails the point of DSLs from a business perspective. It sounds great to talk about business users writing the rules for the software. Every rules engine vendor makes this claim. In practice I’ve never seen it happen. Developers end up writing the business rules in the syntax of the particular engine.

Creating readable DSLs is great if you can communicate with the business experts. And even if you fall short the developers/testers get clear concise code out of the exercise.