I recently came across a blog post by John Jacobsen with the title: From Elegance to Speed. The author had a short function written in Clojure that he ported to Common Lisp for an impressive performance improvement. Along the way, he made some changes in the approach for his Common Lisp implementation but didn’t apply those ideas to his Clojure code. I thought it was worth a try to see what we could do with Clojure.
First, any microbenchmarking experiment should be taken with a grain of salt. The goal of Clojure is to be a powerful tool for writing large programs to solve complex problems. It’s not necessarily optimized for snippets of simple arithmetic. For more information about Clojure, please see the Rationale.
With that disclaimer out of the way, let’s look at the original Clojure function, called
(defn smt-8 [times] (->> times (partition 8 1) (map (juxt identity (comp (partial apply -) (juxt last first)))) (filter (comp (partial > 1000) second))))
The input is a monotonically increasing sequence of long integers representing time data. The function walks through the sequence looking for clusters (in a sliding window of size 8) that are close in time (within a 1000 milliseconds). The result is a sequence of the desired clusters and their widths. From domain knowledge, we expect that such clusters are relatively rare in the input data set.
I think the original
smt-8 is reasonable. It uses a few higher-order functions that might
not be recognized by non-Clojure programmers. Here’s a slightly modified Clojure
implementation that runs a bit faster for me.
;; modest improvement using `keep`, about 20% faster (defn smt-8a [times] (->> times (partition 8 1) (keep (fn [part] (let [diff (- (last part) (first part))] (when (> 1000 diff) [part diff]))))))
keep function is a something like a combination of
filter and I think it
makes the code a bit more readable, but that’s naturally a matter of taste.
In another post, the author explains how he was experimenting with laziness and
infinite sequences when he wrote the original
smt-8. Many of the core functions in
Clojure support lazy sequences which makes Clojure an excellent tool for this kind of
exploration. However, for his comparison with Common Lisp, he decided to punt on the
question of laziness. That’s fine if you don’t need laziness, but I want to point out to
casual readers that laziness can be a useful feature especially when working with large data
sets. However, laziness does require a bit of overhead that you might prefer to avoid when
doing competitive benchmarking. Common Lisp is not lazy by default so you would need to do
some extra work or find a good library if you wanted a lazy solution in Common Lisp.
If we don’t need a lazy solution, and we have a finite input sequence, we can make things significantly faster using a vector as input. In particular, we don’t have to create intermediate partitions anymore. We can just check the values at the appropriate indices. (Remember the input is monotonic so only the end points of a potential cluster matter.) Assuming it’s safe to convert the input to a vector, we can try something like the following.
;; for better performance, the original data should already by a vector (defn smt-8for [times] (let [v (vec times) width 8] (for [i (range (- (count v) (dec width))) :let [diff (- (v (+ i (dec width))) (v i))] :when (> 1000 diff)] [(subvec v i (+ i width)) diff])))
smt-8for, we’re using a
for comprehension to drive the calculations. We’re also
taking subvectors out of the original vector to make our clusters, but only if we have
already determined that the cluster belongs in the result. We are eliminating the creation
of many partitions that would end up being filtered out. The
subvec function does not
copy, it just keeps an offset into the original vector. It’s fast but it prevents garbage
collecting the original vector as long as the subvectors are alive. If that’s not
acceptable, you can make a copy of the cluster instead. In my tests,
smt-8for is more
than 30 times as fast as the original
You can get a bit more performance if you’re careful about boxed math. Normally, you shouldn’t care so please don’t get carried away with this next tip. You can read the documentation on type hints, *warn-on-reflection* and *unchecked-math* for more information.
(set! *unchecked-math* :warn-on-boxed) (defn smt-8forh [v] (let [width 8] (for [^long i (range (- (count v) (dec width))) :let [diff (- ^long (v (+ i (dec width))) ^long (v i))] :when (> 1000 diff)] (list (subvec v i (+ i width)) diff))))
Adding the type hints squeezes a bit more performance out of the same code. I measured better than 20% increase in performance, but remember this code is doing simple arithmetic and almost no “business logic” so we’re mostly measuring what would normally be considered overhead costs.
Clojure has another feature called Transducers that can improve performance for chains
of operations over input sequences. Here’s a slightly complicated implementation using
transducers. To get extra speed, this version builds a list of results, which is slightly
faster than appending to a vector. It runs the
range calculation in reverse to compensate
for the natural addition at the head of the output list. I have to admit that the slight
performance improvement isn’t worth increasing the obscurity of the code as I’ve done in
this case. However, the general point is that transducers are often a win for performance
and clarity. In my tests, this transducer version is marginally faster than the
(defn smt-8x [v] (let [width 8] (into () (keep (fn [i] (let [d (- ^long (v (+ (dec width) ^long i)) ^long (v i))] (when (> 1000 d) (list (subvec v i (+ ^long i width)) d))))) (range (- (count v) (inc width)) -1 -1))))
Now, if you’re desperate for better performance and you have control over your input data,
you can use Java arrays, which have built-in support in Clojure. For appropriate
situations, they are faster than the more refined Clojure collections, especially for direct
access to data that is not shared. Of course, you can cause bugs with wanton mutations so
you have to be careful. Frankly, I would rarely make this trade-off in real code. On the
other hand, array access is a useful tool when playing the microbenchmark game. If you have
a Clojure collection, you can populate a Java array of longs with the Clojure function
long-array. Elements are accessed with
aget. Our new
smt-8arr is similar to
smt-8forh but adapted for the Java array input.
;; use Java array for speed (defn smt-8arr [^longs larr] (let [width 8] (for [^long i (range (- (alength larr) (dec width))) :let [diff (- (aget larr (+ i (dec width))) (aget larr i))] :when (> 1000 diff)] (list (for [j (range i (+ i width))] (aget larr j)) diff))))
On my machine, I get about a 200x improvement over the original code. Of course, I’m
assuming the data can be stored in the appropriate
long-array in advance. In real world
applications that do more than arithmetic, the business logic usually dominates so the
overhead of laziness and persistent data structures is minimal by comparison. As I remarked
earlier, I would not normally resort to Java arrays unless I had to inter-operate with Java
The big win in this case was avoiding object creation for partitions that were immediately
discarded. I think most peole would prefer the
for comprehension as shown in my
smt-8for example. That version gave respectable peformance from clean code (IMHO).
Execution times from Criterium on my oldish iMac.
The source code for this post is available as a gist.