Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
Refactoring the Deeply-Nested Hash Antipattern

On a few different Ruby projects, I have seen an antipattern emerge involving nested hashes.

For example, consider this simple question-and-answer command-line script:

QUIZ = {
  question_1: {
    prompt: "What is your name?"
  },

  question_2: {
    prompt: "What is your favorite color?"
  }
}

answers = {}

QUIZ.each do |name, question_hash|
  print question_hash[:prompt], " "
  answers[name] = gets.chomp
end

p answers

The QUIZ constant isn’t too hard to understand. The keys (question_1 and question_2) each point to a value that is itself another hash. Let’s call that a question_hash. For now a question_hash has only one key-value pair, the prompt.

The answers hash simply accumulates answers. The key is the key from the QUIZ constant, and the value is the answer the user typed in.

At the end of the script, the program calls Kernel#p to print out the inspect string for the answers hash.

Sure enough, the program works.

$ ruby quiz.rb
What is your name? Grant
What is your favorite color? orange
{:question_1=>"Grant", :question_2=>"orange"}

Take a look at the QUIZ.each block up there. Notice that the inside of the block assumes that the question_hash will have a :prompt key.

That’s not too complicated. But let’s add some more features to our script. Now we will add a multiple-choice question to the QUIZ hash:

QUIZ = {
  question_1: {
    prompt: "What is your name?"
  },

  question_2: {
    prompt: "What is your favorite color?"
  },

  question_3: {
    prompt: "What is your favorite integer between 1 and 5?",
    choices: %w[1 2 3 4 5]
  }
}

Now we want to treat a question_hash differently if it has a choices key. One way to do this is to write two different methods that take the question_hash as an argument and ask the question. The methods will return the answer that the user inputs.

def simple_answer(question_hash)
  print question_hash[:prompt], " "
  gets.chomp
end

def multiple_choice_answer(question_hash)
  choices = question_hash[:choices]
  answer = nil

  until choices.include?(answer)
    print question_hash[:prompt], " "
    answer = gets.chomp
    puts "You must pick an answer from #{choices}" unless choices.include?(answer)
  end

  answer
end

Now we simply update our block to call the appropriate method for each question_hash.

QUIZ.each do |name, question_hash|
  answer = if question_hash[:choices]
    multiple_choice_answer(question_hash)
  else
    simple_answer(question_hash)
  end

  answers[name] = answer
end

p answers

This all still works fine.

$ ruby quiz.rb
What is your name? Grant
What is your favorite color? orange
What is your favorite integer between 1 and 5? 7
You must pick an answer from ["1", "2", "3", "4", "5"]
What is your favorite integer between 1 and 5? 3
{:question_1=>"Grant", :question_2=>"orange", :question_3=>"3"}

But the code is very quickly getting more complex. Before long, we will have questions with sub-questions, conditional follow-up questions, and even more complicated features. Here’s an example that supports a Proc for formatting the user’s answer.

QUIZ = {
  question_1: {
    prompt: "What is your name?"
  },

  question_2: {
    prompt: "What is your favorite color?",
    display_proc: ->(color) { "The color is #{color}." }
  },

  question_3: {
    prompt: "What is your favorite integer between 1 and 5?",
    choices: %w[1 2 3 4 5]
  }
}

answers = {}

def simple_answer(question_hash)
  print question_hash[:prompt], " "
  gets.chomp
end

def multiple_choice_answer(question_hash)
  choices = question_hash[:choices]
  answer = nil

  until choices.include?(answer)
    print question_hash[:prompt], " "
    answer = gets.chomp
    puts "You must pick an answer from #{choices}" unless choices.include?(answer)
  end

  answer
end

QUIZ.each do |name, question_hash|
  answer = if question_hash[:choices]
    multiple_choice_answer(question_hash)
  else
    simple_answer(question_hash)
  end

  answers[name] = answer
end

display_answers = Hash[
  answers.map do |name, answer|
    if display_proc = QUIZ[name][:display_proc]
      [name, display_proc[answer]]
    else
      [name, answer]
    end
  end
]

p display_answers

The output now looks like this

$ ruby quiz.rb
What is your name? Grant
What is your favorite color? orange
What is your favorite integer between 1 and 5? 6
You must pick an answer from ["1", "2", "3", "4", "5"]
What is your favorite integer between 1 and 5? 3
{:question_1=>"Grant", :question_2=>"The color is orange.", :question_3=>"3"}

And the code will keep growing and growing, making it harder for the next developer to understand exactly how the big Hash will translate into the user’s experience.

Take a look at those answer methods. Whenever I see a bunch of methods that accept the same argument, I suspect that there is an object missing in my system. Let’s create a new Question object. Imagine that we have it in that QUIZ.each block, and that we can simply ask the Question for its answer directly.

class Question
  def initialize(question_hash)
    @question_hash = question_hash
  end

  def answer
    if @question_hash[:choices]
      multiple_choice_answer(@question_hash)
    else
      simple_answer(@question_hash)
    end
  end

  private

  def simple_answer(question_hash)
    print question_hash[:prompt], " "
    gets.chomp
  end

  def multiple_choice_answer(question_hash)
    choices = question_hash[:choices]
    answer = nil

    until choices.include?(answer)
      print question_hash[:prompt], " "
      answer = gets.chomp
      puts "You must pick an answer from #{choices}" unless choices.include?(answer)
    end

    answer
  end
end

QUIZ.each do |name, question_hash|
  question = Question.new(question_hash)
  answers[name] = question.answer
end

Notice that simple_answer and multiple_choice_answer are now completely private. That’s good, because it hides away the internal details of how a Question works. We are now free to start doing some refactoring, without worrying about the rest of the code in the system breaking.

Our first refactor will be to remove the arguments from the private methods, since it’s now possible to get at the @question_hash directly from within the Question object.

class Question
  def initialize(question_hash)
    @question_hash = question_hash
  end

  def answer
    if @question_hash[:choices]
      multiple_choice_answer
    else
      simple_answer
    end
  end

  private

  def simple_answer
    print @question_hash[:prompt], " "
    gets.chomp
  end

  def multiple_choice_answer
    choices = @question_hash[:choices]
    answer = nil

    until choices.include?(answer)
      print @question_hash[:prompt], " "
      answer = gets.chomp
      puts "You must pick an answer from #{choices}" unless choices.include?(answer)
    end

    answer
  end
end

Next, notice that the only reason we are passing the question_hash into the Question is to get out the prompt and the choices. When a constructor takes a hash with expected keys, you can swap the hash out for multiple arguments instead.

class Question
  def initialize(prompt, choices)
    @prompt = prompt
    @choices = choices
  end

  def answer
    if @choices
      multiple_choice_answer
    else
      simple_answer
    end
  end

  private

  def simple_answer
    print @prompt, " "
    gets.chomp
  end

  def multiple_choice_answer
    answer = nil

    until @choices.include?(answer)
      print @prompt, " "
      answer = gets.chomp
      puts "You must pick an answer from #{@choices}" unless @choices.include?(answer)
    end

    answer
  end
end

Now the object is easier to understand. But there’s still a problem. To use this new object, the each block needs to be updated:

QUIZ.each do |name, question_hash|
  prompt = question_hash[:prompt]
  choices = question_hash[:choices]
  question = Question.new(prompt, choices)
  answers[name] = question.answer
end

This is uglier than what we had before. But there’s a quick fix that makes our code much nicer. Let’s change the QUIZ hash so that its values are already Question objects:

QUIZ = {
  question_1: Question.new("What is your name?", nil),
  question_2: Question.new("What is your favorite color?", nil),
  question_3: Question.new("What is your favorite integer between 1 and 5?",
                           %w[1 2 3 4 5])
}

This makes our each block much nicer!

QUIZ.each do |name, question|
  answers[name] = question.answer
end

But there’s still one problem. Our QUIZ no longer holds the display_proc, so this code from before fails:

display_answers = Hash[
  answers.map do |name, answer|
    if display_proc = QUIZ[name][:display_proc]
      [name, display_proc[answer]]
    else
      [name, answer]
    end
  end
]

So let’s put that proc into the Question object and make it available via an attr_reader. While we’re at it, we can give default values of nil for both choices and nil.

class Question
  attr_reader :display_proc

  def initialize(prompt, choices = nil, display_proc = nil)
    @prompt = prompt
    @choices = choices
    @display_proc = display_proc
  end

  # ...

Now our QUIZ looks like this:

QUIZ = {
  question_1: Question.new("What is your name?"),
  question_2: Question.new("What is your favorite color?",
                           nil,
                           ->(color) { "The color is #{color}." }),
  question_3: Question.new("What is your favorite integer between 1 and 5?",
                           %w[1 2 3 4 5])
}

And our display_answers code looks like this:

display_answers = Hash[
  answers.map do |name, answer|
    if display_proc = QUIZ[name].display_proc
      [name, display_proc[answer]]
    else
      [name, answer]
    end
  end
]

Now let’s move the logic for deciding how an answer should be formatted into the Question object. (We can remove the attr_reader for display_proc at this time as well, to keep things nicely hidden away.)

class Question
  # ...

  def display_answer(answer)
    if @display_proc
      @display_proc[answer]
    else
      answer
    end
  end

  # ...

Which gives us:

display_answers = Hash[
  answers.map do |name, answer|
    question = QUIZ[name]
    [name, question.display_answer(answer)]
  end
]

At this point, it seems like the question name could also easily be put into the Question object as a new first argument. Let’s replace the QUIZ constant with a QUESTIONS constant that holds an array of Question objects.

QUESTIONS = [
  Question.new(:question_1,
               "What is your name?"),
  Question.new(:question_2,
               "What is your favorite color?",
               nil,
               ->(color) { "The color is #{color}." }),
  Question.new(:question_3,
               "What is your favorite integer between 1 and 5?",
               %w[1 2 3 4 5])
]

answers = {}

QUESTIONS.each do |question|
  answers[question.name] = question.answer
end

display_answers = Hash[
  QUESTIONS.map do |question|
    answer = answers[question.name]
    [question.name, question.display_answer(answer)]
  end
]

p display_answers

Now the code is much cleaner. There are definitely more places the code could be refactored. choices and display_proc could be moved into an options hash argument since they are truly optional. Maybe there could be a SimpleQuestion superclass and a subclass called MultipleChoiceQuestion with all the choices-related code.

The most important thing to realize is that now the conversation that future developers might have about this code has shifted.

Before, the conversation would probably have centered around how complicated the code looked and confusion about how the structure of the QUIZ hash affects the control flow of the rest of the program. What is the easiest way to make a small change to add a new feature without breaking the whole big complex structure?

Now, the conversation can be about whether the current object model makes sense or should be tweaked in some small way. And with the full power of a Ruby class instead of the limited behavior of a Hash, the developers will have more options.

This code example is available on GitHub with a full Git history of the working code at each stage. Here’s the final code:

class Question
  attr_reader :name

  def initialize(name, prompt, choices = nil, display_proc = nil)
    @name = name
    @prompt = prompt
    @choices = choices
    @display_proc = display_proc
  end

  def answer
    if @choices
      multiple_choice_answer
    else
      simple_answer
    end
  end

  def display_answer(answer)
    if @display_proc
      @display_proc[answer]
    else
      answer
    end
  end

  private

  def simple_answer
    print @prompt, " "
    gets.chomp
  end

  def multiple_choice_answer
    answer = nil

    until @choices.include?(answer)
      print @prompt, " "
      answer = gets.chomp
      puts "You must pick an answer from #{@choices}" unless @choices.include?(answer)
    end

    answer
  end
end

QUESTIONS = [
  Question.new(:question_1,
               "What is your name?"),
  Question.new(:question_2,
               "What is your favorite color?",
               nil,
               ->(color) { "The color is #{color}." }),
  Question.new(:question_3,
               "What is your favorite integer between 1 and 5?",
               %w[1 2 3 4 5])
]

answers = {}

QUESTIONS.each do |question|
  answers[question.name] = question.answer
end

display_answers = Hash[
  QUESTIONS.map do |question|
    answer = answers[question.name]
    [question.name, question.display_answer(answer)]
  end
]

p display_answers

Comments
  1. Renato Zannon says:

    Nice post, this is a problem I’ve stumbled into myself. It gets particularly awkward when you start loading these things from YAML files. Suddenly you have to fit your model, however complex, on a hash-like structure, and then code your way out of the deserialization problem.

    If creating questions is a task that is done multiple times, I would also consider an `instance_eval`-ing DSL for that task. Ruby lets us create DSLs that look almost natural language, to the point that it should be feasible to a non-technical team member to use them directly.

  2. David says:

    It seems you are solving a problem that doesn’t exist, and I’m surprised PL have published this. What kind of mindless developer throws such complexity into a hash, when working with a fully OBJECT-oriented language such as Ruby?

    Creating an object is the first thing you’d do.

    • David, oftentimes you will work on teams with developers who are not as experienced with good object design principles. When a team or codebase is large, it may be tempting for even experienced developers to cut corners and bring on technical debt so that a feature makes it out the door more quickly. So believe me, in my years of working on legacy Rails applications, I have run into this antipattern many times, always from well-meaning teams. I’ll even admit to having written a few of them. The most important thing is to recognize when it’s happening, and to refactor the problem away as soon as it starts developing, because less-experienced developers will tend to follow what they see and cause the Hash to grow.

  3. Jeff Gandt says:

    This is an excellent deconstruction of an insidious anti-pattern.

    Interestingly enough, I too have come to a similar conclusion, but I found myself wanting a more flexible way of creating the attribute variables on the object.

    Enter Ostend: https://github.com/jgandt/ostend

    Everything is in the documentation, but basically I wanted a hash deconstrution/Object translation tool.

    Could this gem helpful? Or would it be harmful to your deconstruction above?

    I’m looking for feedback in general too. Let me know if you have any thoughts.

    Feel free to call the idea not-smart/bad ruby…

    .jpg

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *