Marcos Castilho

Refactoring by the book

January 08, 2013

Refactoring is one of the techniques that allows us to be agile and apply an evolutionary approach to our design. A core XP practice and one of TDD's pillars, constant refactoring keeps the evil design upfront at bay and maintain our codebases in a healthy state. One of the few things I think most developers agree on is that refactorings are helpful and should be done at some point in any project, preferably in a continuous manner.

The problem is that, like every other developer term, the definition of refactoring has become muddled over time. Refactoring is now commonly conflated with it's more dangerous cousin, the Rewriting. The term is used whenever we want to improve the design of some part of a codebase, either a single class or entire subsystems.

Day or week long refactoring are becoming common place in agile projects, they even have their own stories and board lanes. While its all done with good intentions, these long sessions of madly rewriting every bad part of a codebase carry a lot of risks and are a far cry from what refactorings were supposed to be.

And since everybody knows refactoring is core agile thing, the way we do it is seldom questioned. After all, getting to a better design is supposed to be a good thing right? Let's go back to how refactorings were introduced to the developer community and see what we missed along the way.

The original refactorings

Although they existed a long time before that, Refactorings were introduced to the large world by Martin Fowler's book Refactoring: Improving the Design of Existing Code, where Fowler writes about coding practices from legendary SmallTalkers like Kent Beck and Ward Cunningham A definition from the same book:

Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.

Its heart is a series of small behavior preserving transformations. Each transformation (called a “refactoring”) does little, but a sequence of transformations can produce a significant restructuring. Since each refactoring is small, it's less likely to go wrong. The system is kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.

The original refactorings were presented using a pattern format. They have names, like Extract Method or Replace Temp with Query, a explanation of when to use the refactor and a list of steps you have to follow in order to apply them. You can browse the refactorings catalog on refactoring.com

They are meant to be applied in a conscious manner, one after the other, until your design is good enough for you to move forward. Words like disciplined and small are present on the very definition of the term.

When you wildly rewrite large parts of your system, you are not refactoring, even if you are just trying to achieve a better design. The system needs to be kept working the whole time. That means the tests need to stay green! The TDD flow is supposed to be Red -> Green -> Refactor, and not Red -> Green -> Red -> Red -> Shit -> Rollback.

Some ways to refactor on the green

Let's see how we can refactor some code in a more disciplined manner, keeping the tests always on the green.

Instead of working on some contrived User class we are going to use a real example, the class Request from the Pacto project. You don't need to understand what the class does, just pay attention to the code structure and how we apply the refactorings iteratively.

module Pacto
  class Response
    attr_reader :status, :headers, :schema

    def initialize(definition)
      @definition = definition
      @status = @definition['status']
      @headers = @definition['headers']
      @schema = @definition['body']
    end

    def instantiate
      Faraday::Response.new(default_env)
    end

    private

    def default_env
      {}.tap do | env |
        env[:status] = @definition['status']
        env[:response_headers] = @definition['headers']
        env[:body] = JSON::Generator.generate(@schema) if @schema && !@schema.empty?
      end
    end
  end
end

This class has many small issues, but the main thing I want to do is to remove the instatiate method. This method is forcing the Pacto codebase to deal with instances of a foreign class, Faraday::Response. The thing is that Faraday::Response is a very simple class, and we can easily implement the interface we need on the Pacto::Response class itself, eliminating the dependency on external code.

Step 0 - Check the current tests

Before we start to refactor, a very important step is to check if the code in question is actually covered by tests. A quick way to do it is to just do the refactoring, in our case simply remove the instantiate method and see if anything breaks.

If you get zero red tests, that's not a good thing! That means you have no way to be sure that your refactorings didn't impacted the codebase, no safety net to catch your mistakes. There's a famous phrase that says "refactorings without tests is just changing shit". If that happens, just rollback your change and start implementing the tests for the behavior you want to change.

Luckily for us Pacto has test suite, so we get a precise red spec if we remove the instantiate method.

our broken spec

Step 1 - Move Method

The failing spec tell us that we need a method body on whatever instantiate returns. We can see the body definition passed on the Faraday::Response initialization. Let's apply a slightly altered Move Method refactoring to move it from the default_env definition to the Response class.

module Pacto
  class Response
    attr_reader :status, :headers, :schema

    def initialize(definition)
      # snip
    end

    def instantiate
      Faraday::Response.new(default_env)
    end

    def body
      JSON::Generator.generate(@schema) if @schema && !@schema.empty?
    end

    private

    def default_env
      {}.tap do |env|
        env[:status] = @definition['status']
        env[:response_headers] = @definition['headers']
        env[:body] = body
      end
    end
  end
end

We run the tests to make sure we didn't break anything. Since we only changed private methods and we kept the original structure, everything runs fine.

Step 2 - Change Faraday::Response to self

It turns out that step one is the only thing we need to do in order to be able to replace Faraday::Response with self on the *instantiate method, so we change the method.

# Inside Pacto::Response class

def instantiate
  self
end

That's probably the most useless method ever, but remember, we want to keep the tests green. By allowing this travesty to live a little longer on the codebase we can be sure that nothing will break and move forward with a few other refactorings before paying attention to other parts of the Pacto codebase.

Step 3 - Remove the unused default_env method

So we have a private method that's not used anymore. Safest refactoring you can have, we don't even need to run the tests for this one, right? Well, remember that in Ruby private methods can be called with send (a very bad practice btw), so you are never totally sure that's something is not important just by eyeballing a class. Always run your tests.

Thankfully Pacto is not a metaprogramming happy codebase, so our tests still run green after removing the default_env.

module Pacto
  class Response
    attr_reader :status, :headers, :schema

    def initialize(definition)
      @definition = definition
      @status = @definition['status']
      @headers = @definition['headers']
      @schema = @definition['body']
    end

    def instantiate
      self
    end

    def body
      JSON::Generator.generate(@schema) if @schema && !@schema.empty?
    end
  end
end

Step 4 - Remove unused @definition field

Another piece of private unused code, same rules as the previous step.

module Pacto
  class Response
    attr_reader :status, :headers, :schema

    def initialize(definition)
      @status = definition['status']
      @headers = definition['headers']
      @schema = definition['body']
    end

    # snip
  end
end

Step 5 - Remove excessive conditionals

On the Response#body method we have some conditionals that could be easily replaced by a default value for the schema attribute. Doing a "let's see if something break" trick we discover that this conditionals are not under test, so first we add the relevant specs:

module Pacto
  describe Response do
    # some other specs

    describe 'the response body' do
      context 'when the definition has an nil body' do
        let(:response) { Response.new(definition.merge('body' => nil)) }

        it 'is nil' do
          expect(response.schema).to eq(Hash.new)
        end
      end
    end

    # more specs
  end
end

Then we replace the conditionals with a default value for the schema field:

module Pacto
  class Response
    attr_reader :status, :headers, :schema

    def initialize(definition)
      # other attrs
      @schema = definition['body'] || {}
    end

    def instantiate
      self
    end

    def body
      JSON::Generator.generate(schema)
    end
  end
end

Step 6 - Remove instantiate method

Finally the class is looking good enough and we can go forward and remove the instantiate method. Whoever is using that method can just use a Response instance instead.

For this step we simply delete the method and fix all the broken specs. Thankfully in this case, the Response class was used only by another class, so it was a very straightforward process. For cases were you have many dependencies, you might let the offending method live for a little longer and replace each use separately on work on each case separately.

module Pacto
  class Response
    attr_reader :status, :headers, :schema


    def initialize(definition)
      @status = definition['status']
      @headers = definition['headers']
      @schema = definition['body'] || {}
    end

    def body
      @body ||= JSON::Generator.generate(schema)
    end
  end
end

There's still some things we could do on this class, like turn it into a Struct, but it's good enough. Knowing when to stop refactoring is as important as knowing how to do it, the world is full of yaks and rabbit holes.

Conclusion

You might be thinking this whole thing was a bunch of dev melodrama, that a good developer could have done all the changes in a fell swoop. Well, if you look at the simple example, that's true. But we face much nastier situations in our daily projects.

By using a more methodical approach to refactoring I've had a fun coding session, that went quite fast. All the commits were short and precise and the coolest thing is that I actually pushed the individual steps to the master branch in the Pacto code base, since I kept the whole project working all the time.

The next time you see an ugly class that's begging for some love, fight the urge to just hack it away and try to be more disciplined about your refactorings. Argue with your teammates whenever you see some someone creating a refactor branch, or diving in a month long refactor epic.

A good advice is to read Martin Fowler's book, there is even a more recent ruby version, although the java code in the original one is very easy to read. Another cool place to learn more about the refactoring practice is the c2 wiki, where you can see people arguing about the practice, including some famous names, at the time it was introduced.