Keypaths

I recently updated an answer I gave a long time ago to a question on Stack Overflow about how to get the key paths of nested maps in Clojure. It’s a fun puzzle if you like that sort of thing. I happened to be working on some unrelated code when I remembered this old question and came up with a potential improvement. It’s only a few lines of code so I thought I would share it in this blog post.

When I revisited the question, I was happy to see that Marshall Abrams (mars0i) had taken the time to benchmark the suggested answers. He published his results with his test project.

The original question was how to find the all the possible key paths into a nested map structure. The example was:

(keys-in {:a :A, :b :B, :c {:d :D}, :e {:f {:g :G, :h :H}}})
;; => [[:a] [:b] [:c :d] [:e :f :g] [:e :f :h]]

We want the result to be a sequence of keypaths, but we don’t care about the order. A keypath is just a sequence of keys, as you would use with get-in, assoc-in or update-in. Some of the answers allowed a single keyword as a “path”, but I prefer to wrap even a single keyword in a vector which is more consistent with assoc-in.

My original answer was what I’ll call here keypaths1.

(defn keypaths1
  ([m] (keypaths1 [] m))
  ([prev m]
   (reduce-kv (fn [res k v] 
                 (if (map? v)
                    (into res (keypaths1 (conj prev k) v))
                    (conj res (conj prev k))))
              []
              m)))

I have to admit, I’m a big fan of reduce-kv. It’s eager, but when you’re dealing with maps, you usually can’t take advantage of laziness so you might as well be fast. I think this code is pretty easy to understand. The recursive part executes when there’s a nested map value. In practice, we don’t expect data to be too deeply nested so we’re not worried about blowing the Java stack. Anyway, I thought is was a decent answer. It’s also pretty fast.

However, I was a little disappointed that a Specter solution attributed to Nathan Marz slightly out performed my simple code.

I came back to this question with an idea for how to improve my answer. You’ll notice above that every recursive call starts a new sequence. The init arg to reduce-kv is the empty vector. Thus, the result of the recursive call has to be copied by into into our final result sequence. That’s pretty common in Clojure code, but it’s not ideal.

The simple improvement involves adding an extra argument, which is an accumulator (or “the result so far”). Here’s the revised code:

(defn keypaths
  ([m] (keypaths [] m ()))
  ([prev m result]
   (reduce-kv (fn [res k v] 
                  (if (map? v)
                     (keypaths (conj prev k) v res)
                     (conj res (conj prev k))))
              result
              m)))

The top-level accumulator result becomes the init argument to reduce-kv. The reducing function itself has a res accumulator as its first argument. Essentially, the same accumulator ends up being passed to the recursive calls and we naturally produce one sequence without the need to copy at all.

Not much changed but it is significantly faster. I have to say that I find the code quite pleasing. (Famous last words! Tweet @miner if you disagree.)

I forked Marshall Abrams’ benchmarking code and added my improvements. Here’s my version of key-path-tests

You can lein run to get the Criterium results. On my machine, the new miner49r-keypaths-acc (same as keypaths above), is now the fastest answer. Of course, performance isn’t everything, but it is gratifying when fairly simple code performs well.

Addendum: The accumulator technique is part of ancient Lisp lore. The principal reason to use an accumlator is to organize your code so that it is tail recursive. (In Clojure, we would use recur to optimize explicitly the tail recursive calls.) In this case, my keypaths function is not properly tail recursive. It uses the accumulator to share partial results, which yields a significant performance improvement.