Transducer Chain
Transducers are a new feature included in recent Clojure 1.7 alpha releases. I’ve been playing with them a bit. My code is on github: Transmuters.
As the documentation says, “Transducers are composable algorithmic transformations.” They are functions of a particular form that end up being useful in a variety of contexts. The implementer needs to fullfill certain requirements in order to play nicely with the system. The payoffs are conceptual simplicity and good performance. The user doesn’t need to understand all the magic of the implementation. He can simply compose the built-in transducer functions to get the desired result with minimal conceptual overhead.
Composability is an important part of the design. Clojure transducers compose using the
same comp
function that we’ve had in Clojure since the beginning.
Let’s take a simple example:
(def xs (interpose 0 (range 10)))
;=> (0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9)
;; traditional nested collection function calls
(map inc (remove zero? (take 10 xs)))
;=> (2 3 4 5)
;; when nesting gets deep, the ->> "thread-last" macro is a convenient alternative
(->> xs (take 10) (remove zero?) (map inc))
;=> (2 3 4 5)
;; using a transducer
(sequence (comp (take 10) (remove zero?) (map inc)) xs)
;=> (2 3 4 5)
(transduce (comp (take 10) (remove zero?) (map inc)) * 1 xs)
;=> 120
It occurred to me that instead of running just one transducer across the whole collection
(or stream of inputs), it might be useful to run different transducers on portions of the
input. The idea was to make a chain of transducers. When one transducer terminates (as
with take
), the next one can pick up where it left off. Naturally, if a transducer
doesn’t terminate, the other transducers in the chain will never see any input.
(def process (comp (take 10) (remove zero?) (map inc)))
(sequence process xs)
;=> (2 3 4 5)
(sequence (chain process (map -)) xs)
;=> (2 3 4 5 -5 0 -6 0 -7 0 -8 0 -9)
(transduce (chain process (map -)) + 0 xs)
;=> -21
;; filter handles all the input
(sequence (chain (filter odd?) (map -)) xs)
;=> (1 3 5 7 9)
;; chaining composed transducers
(sequence (chain (comp (take 7) (map (partial * 10)))
(comp (drop 5) (take 3) (map inc)))
xs)
;=> (0 0 10 0 20 0 30 7 1 8)
;; traditional form using concat for the parts
(concat (->> xs (take 7) (map (partial * 10)))
(->> xs (drop (+ 5 7)) (take 3) (map inc)))
;=> (0 0 10 0 20 0 30 7 1 8)
Notice that the traditional collection functions have to concat multiple forms over the
same collection, making sure that the drop
accounts for all input used by the first
form. In contrast, the chained form is nicely localized.
The naturally terminating transducers are take
and take-while
. In other cases, the
reducing function in a transduce
call might use reduced
to terminate. This can get a
little tricky regarding the handling of the input that caused termination. With take
, the
process knows exactly when it has taken the proper amount, but take-while
needs to see one
extra input to decide when to terminate the process. Notice in the following example what
happens with the 5.
(sequence (chain (take 10) (filter odd?)) xs)
;=> (0 0 1 0 2 0 3 0 4 0 5 7 9)
(sequence (chain (take-while #(< % 5)) (filter odd?)) xs)
;=> (0 0 1 0 2 0 3 0 4 0 7 9)
In the above example, the take
transducer takes the first 10 inputs and leaves the rest
for the filter
to consider. The take-while
version is subtly different. The
take-while
terminated when it saw the 5, but that 5 was not seen by the (filter odd?)
transducer. Once it was consumed by the take-while
, it was lost. Most people probably
expected both examples to produced the same result.
So that’s a problem for the chain
combinator. Some terminating transducers consume one
extra input in order to know when to stop. Makes sense, and normally it doesn’t matter
because the process is going to ignore the rest of the input anyway. However, this
reasonable behavior can make it appear that the chain
transducer is skipping an input.
My solution was to hack what I’m calling a pseudo-transducer. I named it pushback
. Add
that to your chain of transducers after one that consumes an extra input, and that input
will be given to the next transducer as its first input. Of course, pushback
is not
really a transducer, but it superficially looks like one when it’s in a chain
. In
reality, it serves as a marker to tell the chain
to give that burned input to the next
transducer, which ends up giving us the desired result.
(sequence (chain (take-while #(< % 5)) pushback (filter odd?)) xs)
;=> (0 0 1 0 2 0 3 0 4 0 5 7 9)
It’s not exactly pretty but it was the best idea I could come up with to handle the situation.
The source for chain
and related bits are on github:
Transmuters. I’ll write up some more details when I
get the chance.