charles c. lee

charles c. lee

staff engineer at Shopify

01 Dec 2016

The Magic of Ordered Hashes in Ruby

Ruby is a dynamic programming language focused on simplicity and productivity. With these goals in mind, the language was developed with a lot of built-in functionality. This in turn, makes it much easier to quickly get something working. However, in the process of making developing easier, commonly used concepts and implementations were abstracted leading to the rising trend of “programming magic”.

I came across an example of magic when working with hashes in Ruby. A Hash (aka. Hash table, hash map, dictionaries, associative array) is a built-in class that represents a “dictionary-like collection of unique keys and values”. One feature that Ruby, unlike other similar languages, hashes have is that they keep the order of the keys inserted.

(Edit: As of Python 3.7, Python dicts are also ordered.)

# Python
python_dict = {}
python_dict[a] = 1
python_dict[b] = 2
python_dict[c] = 3
python_dict # returns {‘b’: 2, ‘c’: 3, ‘a’: 1}

# Ruby
ruby_hash = {}
ruby_hash[a] = 1
ruby_hash[b] = 2
ruby_hash[c] = 3
ruby_hash # returns {“a”=>1, “b”=>2, “c”=>3}

To demystify this magic, I set out to expand on what I already know about hash tables to create these ordered hashes.

There are two goals to keep in mind: The OrderedHash should maintain the insertion order of keys. The performance of looking up, adding and removing key/value(s) should relatively stay the same.

Assuming I have a HashTable implementation, my initial thoughts were to add some form of way to keep track of order of the keys.

class MyOrderedHash
  def initialize
    @key_store = Array.new
    @hash_table = HashTable.new
  end
  def each
    @key_store.each do |key|
      yield([key, @hash_table[key]])
    end
  end
  def find(key)
    @hash_table[key]
  end
  def add(key, value)
    @key_store.push(key)
    @hash_table[key] = value
  end
  def delete(key)
    @key_store.delete(key)
    @hash_table.delete(key)
  end
end

This initial implementation keeps the keys in order using an array internally. Let me run through the list of my goals and see if I’ve met them all.

  • Maintains Insertion Order: Check
  • Fast Lookup: Check
  • Fast Insertion: Check
  • Fast Deletes: …X

The problem with removing a key is that I’ve added the overhead of searching through the array just to remove the key. I need to find a way to keep track of order but also be able to delete keys just as fast. Researching other data structures, linked lists are awesome for deletes as deleting a node within a linked list is done in constant time. However, the problem remains as I would still need to search for the node to delete from the start of the linked list which is no better than using an array.

How do I quickly find the node to delete?

What data structure is great for lookups? A hash table. We’ve come full circle.

The idea is that I would use a hash table to keep track of keys and nodes, whereas the linked list is used to keep track of order and values. By splitting the responsibility of the different functions across these two data structures, I ended up with an optimized implementation.

class MyBetterOrderedHash
  def initialize
    @key_linked_list = LinkedList.new
    @hash_table = HashTable.new
  end
  def each
    @key_linked_list.each do |node|
      yield([node.key, node.value])
    end
  end
  def find(key)
    @hash_table[key]&.value
  end
  def add(key, value)
    if @hash_table[key]
      update_node(@hash_table[key], value)
    else
      new_node = Node.new(key, value)
      @key_linked_list.add_node(new_node)
      @hash_table[key] = new_node
    end
  end
  def delete(key)
    node = @hash_table[key]
    @hash_table.delete(key)
    node.delete
  end
end

Now if I wanted to delete a key from my hash, I could lookup the key in my internal @hash_table and get back the node . With the node, I could perform a delete on the node which would connect the previous node and following node, effectively removing the current node from the key_linked_list. Then I could move on to simply delete the key from the internal @hash_table.

Running through my list of goals again.

  • Maintains Insertion Order: Check
  • Fast Lookup: Check
  • Fast Insertion: Check
  • Fast Deletes: Check

🎉…but there are some trade offs that I’ve made.

Nothing really comes for free and in this case, I didn’t gain the power to have ordered hashes without sacrificing something else. This is a common and well known tradeoff called the Space-time Tradeoff.

Compared a singular hash table, I have to keep track of two separate data structures. By taking up more space , I’ve reduced the time it takes to delete a key and its value from the ordered hash.

All in all, one less thing magical about Ruby.

Notes

  • All Ruby examples were written using Ruby 2.3.0
  • All Python examples were written using Python 3.5.0
    • Python also has an ordered version of Python dictionaries within the collections module called OrderedDict.

Resources