Justin Leitgeb

May 102012
 

Mixins (modules in Ruby) are a way of re-using code across classes. I think that rather than just being a light-weight way to share code, they add complexity to software projects by undercutting some of the benefits that would ordinarily be provided by the tools in object-oriented programming. In this post I’ll discuss some of the disadvantages of modules, and suggest that Ruby programmers should see them as a method of last resort for code sharing only after carefully considering alternative approaches such as creating classes.

About Mixins

Before going into the disadvantages of Mixins as a method of code organization, it’s important to understand their intended use in Ruby and other languages. Mixins are intended to provide a mechanism for multiple inheritance without some of the complexity that this brings into a programming language. The notion of mixins isn’t new in Ruby, and the idea of using groups of related methods that are injected into a class was widely discussed before Matz starting making commits to the Ruby interpreter around 1993 [2].

Mixins, in Ruby and other languages, allow for code reuse across multiple classes without the complex semantics of multiple inheritance found in languages like C++. It’s important to note that mixins are a type of inheritance, which can be easily seen in irb:

module Programmer ; end

class Person
  include Programmer
end

Person.new.is_a?(Programmer) # true

Disadvantages

With the flexibility of an ostensibly lightweight code-sharing scheme in Ruby come strong disadvantages. Inheritance is fundamentally at odds with encapsulation, the addition of capabilities to objects with a single line quickly muddle the responsibilities of classes, and mixins themselves aren’t even the best option for code reuse across classes. I’ll cover each of these separately.

Encapsulation

Object oriented programming is used in order to help us control complexity in large systems. Instead of having confusing globs of code in unorganized piles in our application, we create objects that communicate with one another using simple, well-defined interfaces. Although each one of those objects may do something complex, as a whole we understand the system readily by treating each object and its functionality as a black box. In theory we should only have to understand these objects and use their functionality by sending simple messages to their public interfaces. In short, encapsulation is our friend in complex systems.

We say that inheritance is fundamentally at odds with encapsulation because once you inherit from a class, there are certain things that you have to know about the parent class in order to successfully inherit from it. As a good programmer, you always call ‘super’ when you’re overriding one of the parent class’ methods, and you’re often required to read those methods in order to understand their functionality that you’re extending. The parent class may maintain state in instance variables, and you have to see what those are doing, too, in order to know how to properly extend the parent class.

In a sense, extending functionality through inheritance is opening a Pandora’s box of sorts by breaking the principles of encapsulation. Generally, you’d be better off by extending functionality through delegation to another class via a public API rather than going down this road.

Remember that mixins are a form of inheritance, and they’re breaking encapsulation in the same way as single inheritance. In fact, you could even be introducing more complexity in your application with mixins, since you’re technically implementing code that acts like a slightly simpler form of multiple inheritance.

As a simple example, take a look at the following, clear example of a violation of encapsulation:

module ProgrammingSkills
  def practice_programming!
    @skills = [“Java”]
  end
end

class Person
  include ProgrammingSkills
end

You wouldn’t want to include something that replaced all of your hard-earned, polyglot programming skills with only a knowledge of Java, would you?

Finally, it’s often said that composition is better than inheritance. Even though the keyword for bringing a module into your class, include seems synonymous with ‘composition,’ rest assured, mixins aren’t what people in the software engineering are suggesting.

Single Responsibility Principle

One of the most common refactorings that I apply to programs that I work on is Extract Method [7]. As Martin Fowler described this technique [8], you basically figure out where a class is doing too much work, and create a new class with a set of cohesive methods. You then delegate to it from your old class, which now has a simplified set of responsibilities.

In order to successfully apply this principle, one rule of thumb is that cited by Bob Martin in the “Single Responsibility Principle.” This says that a class should change for one, and only one, reason. If you think about the responsibilities of your class, and can determine another reason that is should change, you should probably extract this behavior out to another class.

Another way of figuring out if you should extract a class is if you find yourself using conjunctions when describing its functionality (e.g., it makes coffee and it washes dishes) [6]. In this scenario, you also should consider extracting a class. Now consider the case where you include a module – what are you doing but adding the necessity of using a conjunction when you import the methods contained in that module into your class?

class Dog
  include WalkingAbility # and…
  include FoodConsumer   # and…
  include CarChaser      # and
end

Traits

I won’t go into a long description of Traits, as they’re clearly articulated in a bunch of papers and also implemented in some form in languages like Scala and Smalltalk. They’re also planned for Ruby 2.0, at least according to the Ruby creator, Matz, in his 2010 RubyConf keynote. Basically, they’re intended to be a more sophisticated method of code inclusion in modules, by providing default collision detection of methods and the ability to more precisely define the order in which things are included. It may be somewhat of a scary thought that once you include a module, it will, without warning, clobber existing methods of the same name.

Without going into a long discussion about traits, and how modules could be improved in Ruby, it’s worth noting in passing that modules aren’t an elegant method of re-using code across your codebase, but actually a pretty crude implementation which can, and probably should be improved within the Ruby language.

Cleaning it up

It’s not my intention to say bad things about mixins without providing an alternative. First, there are some cases where mixins may be useful. Specifically, if you really have a case where something like multiple inheritance is the only thing to cure your ills, then modules would be the logical choice. Needless to say, there are few times in my career when I’ve said, “I know, I need multiple inheritance to solve this problem!” Likewise, I seldom find that I need mixins.

Instead of mixins, I’d encourage you to think about the lightweight system that we already have in place for code re-use and managing complexity in a system. Instead of a module, take the next step and see how you can conceive of your problem in terms of a class which has a well-defined interface and proper encapsulation of a single responsibility. Chances are, you’ll see the benefits of your decision as you debug and extend your system.

More Reading

The resources below are great references on the topics I mentioned above – they cover topics like mixins and how they’re a form of inheritance, as well as things like Traits in programming languages.

  1. S. Klabnik, “Mixins – a refactoring anti-pattern” http://blog.steveklabnik.com/posts/2012-05-07-mixins–a-refactoring-anti-pattern
  2. G. Bracha and W. Cook, “Mixin-based inheritance” pp. 303-311
  3. N. Schärli, S. Ducasse, O. Nierstrasz, “Traits: Composable Units of Behavior” http://scg.unibe.ch/archive/papers/Scha02bTraits.pdf
  4. A. Snyder, “Encapsulation and Inheritance in Object-Oriented Programming Languages”, pp. 38-45
  5. Y. Matz, Rubyconf 2010 Keynote, http://www.slideshare.net/yukihiro_matz/rubyconf-2010-keynote-by-matz
  6. S. Freeman, “Growing Object-Oriented Software, Guided by Tests” location 1258
  7. M. Fowler, “Refactoring: Improving the Design of Existing Code” location 3320
  8. B. Martin, “SRP: The Single Responsibility Principle”, http://www.objectmentor.com/resources/articles/srp.pdf
May 062012
 

As developers, we agree that we want to spend our time developing features that deliver value to our projects. Testing should be a tool that helps us to focus on the features that matter, while giving us the confidence to do important things like refactoring existing code. They should help us to focus on essential complexity in the problems that we’re trying to solve rather than being another obstacle that we have to overcome in order to complete our task.

I’ve seen a number of cases, however, where novices in particular get caught up in sporadically failing tests or teams get dragged down by slow test suites. Below I’ve tried to identify some of the most common mistakes I’ve seen in Rails test suites, along with best practices and solutions to common errors.

Creating ActiveRecord objects when building would suffice

In most Rails applications, much of our time testing is spent writing and running model tests. Following the traditional model of “skinny controller, fat model,” this makes sense. What many people don’t realize until their test suites start taking more than 10 minutes to run is the cost of persisting these records.

Writing to ACID-compliant databases is an expensive thing to do. This isn’t the fault of the database implementations, it’s because they care about not losing your data. The down-side of this is that your tests will start to drag if you always create persisted instances of your objects whenever you have to test a method. Instead, just instantiate a model with the attributes you need, and save the record only when absolutely necessary. If you’re using FactoryGirl, this means using Factory.build instead of Factory.create whenever possible.

If you’re reading a code base or a test, and see a place where a record is being persisted, do yourself and your team a favor and replace it with a test on a non-persisted object.

Creating Persisted ActiveRecord objects in a loop

Following on the above point about minimizing writes to the database in your tests, if you find yourself writing a loop to create 30 records in the database, or even a few, stop. Think about how you can test your function with a couple of mock or stub objects instead.

Failing to stub time, or not recognizing time-related code dependencies

In the past two years, I’ve spent some amount of time on the first day after January 1st fixing tests that depended on the year. In many large applications where a mixture of experienced and senior developers are working on code, chances are, someone will forget that time has an effect on the tests that you write. It’s your responsibility when writing tests to make sure that they’re not going to break due to something external changing. If you’re not thinking about how time may affect your tests, start now. TimeCop is a useful gem for stubbing out time in your application where necessary.

A more insidious manifestation of this problem is the following:

user_a = Factory.create(:user)
user_b = Factory.create(:user)
User.all(order: 'created_at').should == [user_a, user_b]

Although this seems like it should work, the created_at timestamp is recorded with precision only to the second. Since you can probably persist many records per second on your machine, the two users likely have the same created_at timestamp, and you’ve just written code that contains a strange sort of time dependency which will cause sporadic failure.

See the section below, “Depending on an order that isn’t guaranteed,” for more common order-related issues in tests.

Forgetting to prepare or migrate the database

This is a common beginner mistake in testing, and it even occasionally throws off people who have been running tests for a while. You’ll waste less time if you understand the way that tests interact with the database schema. First, when you run ‘rake,’ rspec will run the db:test:prepare task. This loads the current database from the schema.rb file. If you run a single spec example without running the full test suite with ‘rake,’ your test schema won’t necessarily be in sync with what you expect, and you’ll get weird failures. Make sure you run rake db:test:prepare before you run rspec on a single file. Also, if you just created a migration, but haven’t run the migrations yet, don’t expect those changes to be visible to your test. You have to migrate first, then rake db:test:prepare before you see those schema changes in your tests.

Not understanding that javascript-enabled tests are running in another thread

Javascript-enabled tests in Capybara run in another thread. This is stated clearly in the README, but I’ve seen a lot of tests showing that developers haven’t yet internalized this concept. This has a few implications:

  1. Your stubs and mocks won’t work, since the objects you’re modifying in the test suite will have been re-loaded in the server thread when you’re running tests
  2. Database elements (from factories, etc) you create in the test example won’t be visible in the server thread (this is in the README, and related to ‘Not understanding transactions’ below)

Not understanding transactions and how they affect tests

If you’re using transactional fixtures (this is an rspec default, and usually the right thing to do if you’re not using js tests in your suite), you need to understand what this means. A transaction is a set of database statements that are executed as an atomic action. The classic example is moving money from one bank account to another – if anything happens in between, you should completely roll back the transaction to its initial state, rather than only completing part of the transaction.

Rspec leverages transactions to clean out your database tables between examples. A transaction block is started at the beginning of the test, and rolled back at the end, thus clearing out the records that you persisted during the scope of the test. There are a couple of ways that I’ve seen developers waste time troubleshooting tests when they don’t understand transactions well:

  1. They try to access data in another thread, e.g. trying to access data in the server thread in a Selenium test, when the test code is in a transaction block creating factories. You won’t be able to see the data in the server thread, since the effects of a thread are only visible to the database connection that has the transaction open.
  2. If your database state is not getting reset, make sure if you’re using transactional fixtures that your database engine actually supports transactions. I’ve seen this catch developers off-guard when they added a connection to a MySQL database that used MyISAM (which doesn’t support transactions) instead of INNODB (which supports transactions).

Persisting an ActiveRecord object in a place where it won’t get rolled back or otherwise removed

Rspec and DatabaseCleaner are generally configured to roll things back that are created in the context of a test example. This also applies to code that is run in a before(:each) block. Developers expect the database state to be ‘clean’ between examples. I have seen a number of cases where stray data at the start of the test breaks things, and causes sporadic test failures because of inter-spec dependency issues. Here are some places to look if you think you have this problem:

  1. before(:all) blocks – if you persist things to the database here, it’s your responsibility to clear it out after your test.
  2. Inside of a factory, but in code that isn’t in a proc or lambda. Maybe you set up an association, and forgot to wrap it in a lambda {}. This will be created in the database when the test file is first loaded, and will never be cleaned out until the next time you drop and re-create the database.

Changing global state in a test without rolling it back

While rspec and mocha do a good job of limiting the scope of stubs and mocks to a particular example, there are certain things that won’t be reset between test examples. For example, I found one really awful spec that did this:

ActionController::Base.asset_host = nil

The result is that after this particular line was executed, every subsequent test that was run would have a slightly different state than those that ran before the asset_host was set. In this case, links to assets like the application javascript were broken for every example after the above line, and depending on the order that rspec evaluated things, sometimes the integration tests would be broken, and sometimes they wouldn’t.

Be particularly careful about setting class-level variables or other configuration options during a test, as these will affect every subsequent test that you run. Often, this is a sign that your code should be refactored so that you don’t have to change this kind of global state. In the few cases where this is the most pragmatic decision, make sure that you use an ‘ensure’ block to roll back the state after your test.

Depending on an order that isn’t guaranteed

This is one of the most frequent causes of failure that I see when developers break the CI build and then say, “it passes on my workstation!” Most database engines don’t guarantee the order in which results are returned when an explicit ORDER clause is absent. Just because you asked for Person,where(hair_color: ‘blue’) and got back [person1, person2, person3] doesn’t mean that the CI server will return results in the same order – it may not randomize the results locally, so it may seem it’s consistent, but the CI server could just as well have a database that decides to return [person2, person1, person3]. Your test fails, your team gets mad, and everyone becomes less productive for twenty minutes while you push a change to fix the test.

There are two fixes for this scenario:

  1. Provide an explicit ORDER. Remember that if you order by something where two records may have equal values, the database will never guarantee the results of those records unless you give them distinct values, or give it something it can order by without ambiguous results.
  2. Compare Sets instead of Arrays. Two ruby Sets are equal regardless of the order of contents, so if your test example doesn’t care about order, just compare your expected to actual results as two sets. Rspec provides syntactic sugar for this case with the =~ operator. For example, Person.where(hair_color: ‘blue’) =~ [person1, person2, person3] will pass even if the returned results are [person3, person2, person1].

General Fixes and Best Practices

Prevention is the best cure for most of the cases above. Learning Ruby, and frequently reading the source code for rspec and Capybara when you don’t understand how something works will do wonders for your testing and application development skills in general. There are some things that you can do to detect where bad things are happening in code, though:

  1. If you’re not running your rspec test suite with –order random, add that line now and fix any inter-spec dependencies. You’ll be glad that bad tests become visible sooner rather than later.
  2. Most projects should use continuous integration. Jenkins is a good option for this. If you have a test suite with sporadic failures that are tough to reproduce, schedule a build that runs every hour. Figure out which seeds (visible at end of test run when –order random is specified) break the build, and fix those tests.
  3. Turn on profiling output with –format profile. This will show you your slowest-running specs. Chances are, someone committed code that writes records in a loop, and this will help you to find out where it happened.
  4. Make sure that the last build and build history is visible in Jenkins, and don’t deploy the bad builds to staging. It’s much harder to figure out where in recent commits bad code was written if you don’t have the build history available. Continuous integration is nothing new, and if more than one developer is on a project there is no reason to ignore this best practice.

Tests are supposed to provide a quick and reliable way to see if you’ve changed unintended application behavior by adding features or refactoring. It’s annoying, and it makes you feel like you’re wasting time if tests don’t fulfill this basic purpose. I hope that these tips help you to waste less time in testing so that you can create tests that add value to the product that you’re creating, rather than senselessly reduce your development velocity.

Apr 082012
 

Stack Builders, the Ruby on Rails consultancy that I started, runs all of their projects under Continuous Integration (CI). While we’ve tried a bunch of different Continuous Integration servers and services, we’ve settled on Jenkins.  Just a few moments ago, I released the source code that we’re using to run our builds under Jenkins, so I wanted to share some of my thoughts behind writing Railblazer.

For a long time, I set up Rails builds under Jenkins by:

  1. Creating a database.yml for each project on the build server and
  2. setting up a script on the server for each project that copies the database.yml into Jenkins’ workspace

I realized after doing this for several months that each database.yml was essentially the same on any given machine, but that the database.sample.yml so often included with the source of Rails projects is usually not the correct match for local configuration. Generally speaking, database configuration is a machine specific, rather than a project specific thing, which makes the practice of providing a database.yml not really that useful.

In addition, there have been a bunch of cloud-based services recently that have pioneered the practice of automatically configuring a Rails app to connect to a database (e.g. Heroku and tddium).

I decided to see how far I could get towards writing something that auto-detects the database adapter necessary for a Rails application and generates a correct database.yml, based on adapter auto-detection and a local template. This would be the only missing piece to basically have a Rails app configure itself to run tests under Jenkins – as long as you follow best-practices and use RVM and bundler (by creating a Gemfile), it should be straight-forward to write a program to automatically generate a database.yml based on a local template file.

After setting up a new Jenkins server, and adding the first half-dozen project builds for current client projects, it seems that the tools that I have written and released as ‘railblazer’ really does help to configure applications and run builds with minimal intervention. It’s also potentially useful as a tool on a local workstation to generate database.yml files for apps by auto-detection and template rendering rather than by copying and modifying the database.sample.yml provided by others.

I hope it’s as useful for you as it has been for us at Stack Builders! If you have a chance, check it out and let me know what you think. The source code is available at github, and the gem can be installed with ‘gem install railblazer’.

Sep 272011
 

Rails 3.1 offers a handy way to secure your entire application behind https: just add config.force_ssl = true in your environment configuration file, and all requests will be directed to https.  Under the covers this handy snippet of code is loading the Rack::SSL middleware.  What happens if you want to exclude certain URL patterns from this restriction?  The Rack::SSL middleware accepts options that allow you to do this – you can pass it a Proc containing a regular expression which will alter the behavior of Rack::SSL on particular requests.

For example, if instead of using config.force_ssl = true, you used the following snippet your code it would not force ssl on pages under the path /public:

require "rack/ssl"
config.middleware.insert_before ActionDispatch::Static, Rack::SSL, 
  :exclude => proc { |env| env['PATH_INFO'].start_with?('/public') }

Instead of jumping through these hoops in your configuration file, I thought that Rails should allow you to pass options to Rack::SSL.  I submitted a pull request with my changes which was promptly accepted by José Valim, so if you’re either using edge Rails or 3.2 when it comes out you’ll be able to do the following to configure SSL in your application:

config.force_ssl = true
config.ssl_options = { :exclude => 
  proc { |env| env['PATH_INFO'].start_with?('/public') } }

I hope that this change makes configuring your application to be safe a bit easier. Happy SSL’ing!