Simple Test Metrics in Your Rails App, and What They Mean

Posted by Chad Pytel

Oct 22

There are two, low barrier to entry ways to get some quick metrics about your application’s test code and the coverage it provides. Of course there are others, but today we’re just going to focus on the two that are easiest to run and on what they mean: rake stats and rcov.

The first tool available to us comes built into Rails, and that’s rake stats.

rake stats

If you haven’t used it before, rake stats, when run, outputs a quick summary of the lines of code, lines of test code, number of classes, number of methods, the ratio of methods to classes, and the ratio of lines of code per method.

Lets take a look at the output from the application Joe, Mike, Micah, and myself just built for the Rails Rumble, Where’s the Milk At?.

+----------------------+-------+-------+---------+---------+-----+-------+
| Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers          |   176 |   149 |      10 |      18 |   1 |     6 |
| Helpers              |    38 |    35 |       0 |       4 |   0 |     6 |
| Models               |   183 |   147 |       5 |      20 |   4 |     5 |
| Libraries            |     0 |     0 |       0 |       0 |   0 |     0 |
| Integration tests    |     0 |     0 |       0 |       0 |   0 |     0 |
| Functional tests     |   855 |   686 |       9 |       3 |   0 |   226 |
| Unit tests           |   684 |   568 |       7 |       0 |   0 |     0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total                |  1936 |  1585 |      31 |      45 |   1 |    33 |
+----------------------+-------+-------+---------+---------+-----+-------+
  Code LOC: 331     Test LOC: 1254     Code to Test Ratio: 1:3.8

Ok, when looking at the output from rake stats, there are a few important bits of information that you should look at first, and that are all in the final summary line, in this case:

A Code to Test Ratio of 1 to 3.8 is somewhat ridiculous. Its incredibly high, and when you see something like this, its important to ask why? That’s pretty much the entire usefulness of the output of rake stats as a metric. Here are some guidelines I’ve devised, based on the experience of looking at a bunch of applications I consider “well tested” and “poorly tested”.

There are a few other nice things in the output from rake stats that are helpful for a birds eye view of the application. For example, you can tell that we didn’t write integration tests, and our application has 5 models and 10 controllers.

Lets investigate why the 1:3.8 ratio we have in Where’s the Milk At. Going in, and before doing any actual investigation, I have some initial hunches as to why the application has the ratio it does. Those are

Given a rapid development schedule of 48 hours, we basically didn’t have any opportunity to refactor tests

Refactoring tests, just like refactoring code is an essential part of real TDD. Without taking this step, it’d only be natural that our tests would be repetitive, and the lines of test code would be increased. It’s difficult to present a brief example, but here are some typical things that you’ll want to look for in your tests that would be candidates for refactoring

Upon inspection of the Where’s the Milk At test code, I actually found very few, if any, instances of any of the above. In fact, I found that we used extensive use of the macros shoulda provides, we wrote our application specific macros, such as should_have_map and should_display, and we used good practice of shared contexts.

So, I put this aside as a possible cause, but now that I’ve started to review the test code, I’ve started to develop some new ideas about our code to test ratio that I’ll come back to later on.

Our shoulda_macros are being counted as LOTC

We used several helpful shoulda test macros to speed up development. My initial suspicion was that these macros were being counted as lines of test code. After investigating, I was able to determine that rake stats only looks in test/unit, test/functional, and test/integration, so this isn’t the case. I put this aside for now, and pocket the info about how rake stats works internally for possible future use some time down the road.

We have several complex named scopes

The last of my initial assumptions about our ratio (the astute reader will notice I’m 0 for 2 now) is that we have several complex named scopes that are only 1 to 3 lines of code, but have many more lines of test code. Upon inspection, this is clearly the case. Lets take a look at an example.

We have a named scope which returns all of the Purchases that were made in a specific set of stores. Here’s what it looks like:

1
2
3
named_scope :in_stores, lambda {|stores|
  { :conditions => ['purchases.store_id IN(?)', stores] }
}

And here is the accompanying test (this test was pure TDD, the tests were written a little bit at a time before the named scope was actually written).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  context "looking for purchases in stores" do
    setup do
      @stores = [Factory(:store), Factory(:store)]

      @in_store_purchases = []
      @stores.each do |store|
        2.times do
          @in_store_purchases << Factory(:purchase, :store => store)
        end
      end

      Factory(:purchase) # purchase at another store

      @result = Purchase.in_stores(@stores)
    end

    should "not return any purchases for other stores" do
      assert_all @result do |purchase|
        @stores.include?(purchase.store)
      end
    end

    should "return every purchase for the specified stores" do
      assert_all @in_store_purchases do |purchase|
        @result.include?(purchase)
      end
    end
  end

You can see that for our 3 line named_scope, we have 23 lines of test code. That’s a ratio of 1:8, and this is an example of one of the simpler named scopes in the the application (assert_all is an assertion we wrote).

Additionally, we could make this ratio slightly worse (or better, depending on how you’re looking at it) by putting the named scope all on one line, instead of 3.

There are quite a few of these finders and accompanying tests, and I feel confident after investigating that this is one of the reasons for the ratio.

Other causes

In reviewing the test code, I started to notice a few other things the contribute to the ratio.

Take the following test, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
  logged_in_user_context do
    context "with at least one purchase" do
      setup do
        @purchases = paginate([Factory(:purchase)])
        @store     = @purchases.last.store

        @user.     stubs(:purchases).returns(@purchases)
        @purchases.stubs(:latest).   returns(@purchases)
        @purchases.stubs(:paginate). returns(@purchases)
      end

      context "on GET to index" do
        setup do
          get :index
        end

        before_should "find the user's purchases" do
          @user.expects(:purchases).with().returns(@purchases)
        end

        before_should "find the latest purchases" do
          @purchases.expects(:latest).with().returns(@purchases)
        end

        before_should "paginate the purchases" do
          @purchases.expects(:paginate).returns(@purchases)
        end

When you use stubbing for tests, its best practice to write the stubs and then write expectations for what you’ve stubbed. We’re doing this in the above code by putting the stubs in the setup (3 lines of test code) and then using shoulda’s before_should to declare the expectations (9 lines of test code). That’s 12 lines of test code for what is ultimately 1 line of code.

Now, there isn’t anything necessarily wrong with this, again, we’re only investigating causes of the ratio here. But its something to note and perhaps consider for either test refactoring or to somehow incorporate in your test framework.

Finally, I also noticed a lots of tests like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
should "crown the best store" do
  assert_select 'a', "#{assigns(:stores)[0].name}" do
    assert_select 'span[class=crown]'
  end
end

should "rerender the purchase form" do
  assert_select_rjs :replace, 'new_purchase' do
    assert_select '#purchase_store_id[value=?]', @store.id
    assert_match @focus_quantity, @response.body
  end
end

should "remove the purchase from the list" do
  assert_match /new Effect.Fade\("#{dom_id(@purchase)}"/, 
               @response.body
end

In short, we’re testing the views, markup, javascript (some of it), and RJS – as we should be. And we’re doing it quite extensively, there are 45 calls to assert_select and assert_select_rjs in the functional tests. However, rake stats doesn’t count the lines in the views. If you consider that most of the calls to assert_select and its ilk will be surrounded by a should and an end, that’s 3 lines of test code, that aren’t showing up at all as lines of code at all in our rake stats.

If we modify the rake stats task to include the views (which we can’t seriously do without taking other things into account, like javascript, but bare with me here), here is the new output of rake stats:

+----------------------+-------+-------+---------+---------+-----+-------+
| Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers          |   176 |   149 |      10 |      18 |   1 |     6 |
| Helpers              |    38 |    35 |       0 |       4 |   0 |     6 |
| Models               |   183 |   147 |       5 |      20 |   4 |     5 |
| Views                |   605 |   545 |       0 |       0 |   0 |     0 |
| Libraries            |     0 |     0 |       0 |       0 |   0 |     0 |
| Integration tests    |     0 |     0 |       0 |       0 |   0 |     0 |
| Functional tests     |   852 |   683 |       9 |       3 |   0 |   225 |
| Unit tests           |   684 |   568 |       7 |       0 |   0 |     0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total                |  2538 |  2127 |      31 |      45 |   1 |    45 |
+----------------------+-------+-------+---------+---------+-----+-------+
  Code LOC: 876     Test LOC: 1251     Code to Test Ratio: 1:1.4

I’ve spent a lot of time talking about rake stats, but here’s the rub. It’s worthless to tell you the real important metric, how good your test code is. Or, said differently, how much coverage your tests provide for your actual code. You really only want to use rake stats for a high level assessment of your code and as one tool in the arsenal you’ll use for investigation in how to improve your tests.

The guidelines I outlined above are basically the extent of how you should use rake stats for judging your test code. And as I’ve illustrated here, your assumptions about your test code, and even my guidelines may be wrong or flexible.

In fact, based on what I’ve uncovered about the view LOC and the stub/expectations, I may begin to reevaluate my 1:2 guideline.

The second tool you can get up and running with easily, and one that is even more valuable than rake stats is rcov

rcov

rcov executes your tests and does the best job it can telling which lines of code were executed by your tests. The theory being, that if the line of code is executed, then there was a test for it. Rcov provides C0 coverage, so it cannot tell if two parts of a conditional were both hit, the line being executed means that that line had coverage (See a full definition of C0 and the other types of test coverage measures here).

You should get the latest rcov from github, it crashes less. In order to easily run rcov on your rails app, you can use this rake task, which is included in our plugin that provides standard tasks, limerick_rake, which is in turn included in our Rails application template, Suspenders.

Running rcov on Where’s the Milk At? provides the following information:

+----------------------------------------------------+-------+-------+--------+
|                  File                              | Lines |  LOC  |  COV   |
+----------------------------------------------------+-------+-------+--------+
|app/controllers/application.rb                      |    14 |    11 | 100.0% |
|app/controllers/confirmations_controller.rb         |     3 |     3 | 100.0% |
|app/controllers/items_controller.rb                 |    15 |    11 | 100.0% |
|app/controllers/openid_controller.rb                |    27 |    25 | 100.0% |
|app/controllers/passwords_controller.rb             |     3 |     3 | 100.0% |
|app/controllers/purchases_controller.rb             |    48 |    40 | 100.0% |
|app/controllers/sessions_controller.rb              |     7 |     6 | 100.0% |
|app/controllers/stores_controller.rb                |    21 |    18 | 100.0% |
|app/controllers/users_controller.rb                 |    28 |    23 | 100.0% |
|app/helpers/application_helper.rb                   |    38 |    35 | 100.0% |
|app/models/item.rb                                  |    22 |    17 | 100.0% |
|app/models/purchase.rb                              |    55 |    43 | 100.0% |
|app/models/quantity.rb                              |    28 |    27 | 100.0% |
|app/models/store.rb                                 |    10 |     7 | 100.0% |
|app/models/user.rb                                  |    63 |    49 | 100.0% |
|app/models/user_mailer.rb                           |     5 |     4 | 100.0% |
+----------------------------------------------------+-------+-------+--------+
|Total                                               |   387 |   322 | 100.0% |
+----------------------------------------------------+-------+-------+--------+
100.0%   16 file(s)   387 Lines   322 LOC

This shows us that, according to rcov, 100% of the lines of code in our application were executed when our tests were run. This is great, but as with most things, isn’t the whole story and should be taken with a grain of salt. Here are some guidelines/principals you should take into consideration for rcov.

The most important lesson we can take away from rcov is that its not perfect, but it provides a good benchmark. When its not reporting 100%, you can click through and see exactly which lines of code were not executed by your tests. So, in short, its great at identifying deficiencies in your test suite, but should not be taken as a false safety net, thinking that with 90-100% coverage you’re all good because there can be big holes in your coverage and you’d still be reporting 100%.

What All This Means

Hopefully you’ve gotten a good idea of what to look for and how to use these two simple tools to investigate the quality of your tests. The benchmarks and guidelines I’ve presented here are based on my experience developing over 30 rails applications and reviewing the different stats and coverage reports I’ve seen from them, but that doesn’t mean they are inflexible or infallible.

Also, these metrics, the tools, and other ones that exist out there are meant to assist, but not replace your role as a developer. To correctly understand the problem domain and have confidence in the code itself and the test suite, and to realize the obvious fact that these tools do not analyze the logical correctness of anything you’ve done.

Here are the guidelines again, in summary.


Comments on this post

Morgan Roderick

Oct 22

Morgan Roderick said,

Very nice overview, thanks for the writeup!

One thing had me wondering, why did you choose the mergulhao-rcov, and not the spicycode-rcov that it was forked from? Did you find that one first? Are there more bugs fixed?

It’s days like this I really wish I’d spent more time learning git, to able to do a quick diff between the two projects, to be able to assess the differences.

Chad Pytel

Oct 22

Chad Pytel said,

Morgan, I linked to that rcov by mistake. I’ve updated the link to be the spicycode one. Thanks.

Morgan Roderick

Oct 22

Morgan Roderick said,

Chad, I see … I was just wondering if I was missing out on something, being the test-whore that I am ;-)

François Beausoleil

Oct 22

François Beausoleil said,

Hey Chad! Thank you very much for the writeup. At a test to code ratio of 1:1.9, I thought we were pretty high. Seems like I was mistaken :) Didn’t run through rcov. I should do it and report. Thanks!

Seban

Oct 23

Seban said,

Very good writeup. Every Ruby/Rails programmer should be familiar with TDD rules and goals, if he isn’t – he has to read this writeup. A have a question. Why so many things in setup methods? Isn’t test fixtures good enough?

Jon Yurek

Oct 23

Jon Yurek said,

No, fixtures aren’t good enough. In fact, I would go as far as to say that fixtures are actively hurting your code. Factories are the way to go, but more than that, you need to place stubs and (some) expectations in your setup blocks.


Sorry, comments are closed for this article.

© 2000 - 2009 by thoughtbot, inc.
written by a bushel of tiny robots