Keypaths
I recently updated an answer I gave a long time ago to a question on Stack Overflow about how to get the key paths of nested maps in Clojure. It’s a fun puzzle if you like that sort of thing. I happened to be working on some unrelated code when I remembered this old question and came up with a potential improvement. It’s only a few lines of code so I thought I would share it in this blog post.
When I revisited the question, I was happy to see that Marshall Abrams (mars0i) had taken the time to benchmark the suggested answers. He published his results with his test project.
The original question was how to find the all the possible key paths into a nested map structure. The example was:
(keys-in {:a :A, :b :B, :c {:d :D}, :e {:f {:g :G, :h :H}}})
;; => [[:a] [:b] [:c :d] [:e :f :g] [:e :f :h]]
We want the result to be a sequence of keypaths, but we don’t care about the order. A
keypath is just a sequence of keys, as you would use with get-in
, assoc-in
or update-in
. Some
of the answers allowed a single keyword as a “path”, but I prefer to wrap even a single
keyword in a vector which is more consistent with assoc-in
.
My original answer was what I’ll call here keypaths1
.
(defn keypaths1
([m] (keypaths1 [] m))
([prev m]
(reduce-kv (fn [res k v]
(if (map? v)
(into res (keypaths1 (conj prev k) v))
(conj res (conj prev k))))
[]
m)))
I have to admit, I’m a big fan of reduce-kv
. It’s eager, but when you’re dealing with
maps, you usually can’t take advantage of laziness so you might as well be fast. I think this
code is pretty easy to understand. The recursive part executes when there’s a nested map
value. In practice, we don’t expect data to be too deeply nested so we’re not worried about
blowing the Java stack. Anyway, I thought is was a decent answer. It’s also pretty fast.
However, I was a little disappointed that a Specter solution attributed to Nathan Marz slightly out performed my simple code.
I came back to this question with an idea for how to improve my answer. You’ll notice above
that every recursive call starts a new sequence. The init arg to reduce-kv
is the empty
vector. Thus, the result of the recursive call has to be copied by into
into our final
result sequence. That’s pretty common in Clojure code, but it’s not ideal.
The simple improvement involves adding an extra argument, which is an accumulator (or “the result so far”). Here’s the revised code:
(defn keypaths
([m] (keypaths [] m ()))
([prev m result]
(reduce-kv (fn [res k v]
(if (map? v)
(keypaths (conj prev k) v res)
(conj res (conj prev k))))
result
m)))
The top-level accumulator result
becomes the init argument to reduce-kv
. The reducing
function itself has a res
accumulator as its first argument. Essentially, the same
accumulator ends up being passed to the recursive calls and we naturally produce one
sequence without the need to copy at all.
Not much changed but it is significantly faster. I have to say that I find the code quite pleasing. (Famous last words! Tweet @miner if you disagree.)
I forked Marshall Abrams’ benchmarking code and added my improvements. Here’s my version of key-path-tests
You can lein run
to get the Criterium results. On my machine, the new
miner49r-keypaths-acc
(same as keypaths
above), is now the fastest answer. Of course,
performance isn’t everything, but it is gratifying when fairly simple code performs well.
Addendum: The accumulator technique is part of ancient Lisp lore. The principal
reason to use an accumlator is to organize your code so that it is tail recursive. (In
Clojure, we would use recur
to optimize explicitly the tail recursive calls.) In this
case, my keypaths
function is not properly tail recursive. It uses the accumulator to
share partial results, which yields a significant performance improvement.