As the documentation says, “Transducers are composable algorithmic transformations.” They are functions of a particular form that end up being useful in a variety of contexts. The implementer needs to fullfill certain requirements in order to play nicely with the system. The payoffs are conceptual simplicity and good performance. The user doesn’t need to understand all the magic of the implementation. He can simply compose the built-in transducer functions to get the desired result with minimal conceptual overhead.
Composability is an important part of the design. Clojure transducers compose using the
comp function that we’ve had in Clojure since the beginning.
Let’s take a simple example:
(def xs (interpose 0 (range 10))) ;=> (0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9) ;; traditional nested collection function calls (map inc (remove zero? (take 10 xs))) ;=> (2 3 4 5) ;; when nesting gets deep, the ->> "thread-last" macro is a convenient alternative (->> xs (take 10) (remove zero?) (map inc)) ;=> (2 3 4 5) ;; using a transducer (sequence (comp (take 10) (remove zero?) (map inc)) xs) ;=> (2 3 4 5) (transduce (comp (take 10) (remove zero?) (map inc)) * 1 xs) ;=> 120
It occurred to me that instead of running just one transducer across the whole collection
(or stream of inputs), it might be useful to run different transducers on portions of the
input. The idea was to make a chain of transducers. When one transducer terminates (as
take), the next one can pick up where it left off. Naturally, if a transducer
doesn’t terminate, the other transducers in the chain will never see any input.
(def process (comp (take 10) (remove zero?) (map inc))) (sequence process xs) ;=> (2 3 4 5) (sequence (chain process (map -)) xs) ;=> (2 3 4 5 -5 0 -6 0 -7 0 -8 0 -9) (transduce (chain process (map -)) + 0 xs) ;=> -21 ;; filter handles all the input (sequence (chain (filter odd?) (map -)) xs) ;=> (1 3 5 7 9) ;; chaining composed transducers (sequence (chain (comp (take 7) (map (partial * 10))) (comp (drop 5) (take 3) (map inc))) xs) ;=> (0 0 10 0 20 0 30 7 1 8) ;; traditional form using concat for the parts (concat (->> xs (take 7) (map (partial * 10))) (->> xs (drop (+ 5 7)) (take 3) (map inc))) ;=> (0 0 10 0 20 0 30 7 1 8)
Notice that the traditional collection functions have to concat multiple forms over the
same collection, making sure that the
drop accounts for all input used by the first
form. In contrast, the chained form is nicely localized.
The naturally terminating transducers are
take-while. In other cases, the
reducing function in a
transduce call might use
reduced to terminate. This can get a
little tricky regarding the handling of the input that caused termination. With
process knows exactly when it has taken the proper amount, but
take-while needs to see one
extra input to decide when to terminate the process. Notice in the following example what
happens with the 5.
(sequence (chain (take 10) (filter odd?)) xs) ;=> (0 0 1 0 2 0 3 0 4 0 5 7 9) (sequence (chain (take-while #(< % 5)) (filter odd?)) xs) ;=> (0 0 1 0 2 0 3 0 4 0 7 9)
In the above example, the
take transducer takes the first 10 inputs and leaves the rest
filter to consider. The
take-while version is subtly different. The
take-while terminated when it saw the 5, but that 5 was not seen by the
transducer. Once it was consumed by the
take-while, it was lost. Most people probably
expected both examples to produced the same result.
So that’s a problem for the
chain combinator. Some terminating transducers consume one
extra input in order to know when to stop. Makes sense, and normally it doesn’t matter
because the process is going to ignore the rest of the input anyway. However, this
reasonable behavior can make it appear that the
chain transducer is skipping an input.
My solution was to hack what I’m calling a pseudo-transducer. I named it
that to your chain of transducers after one that consumes an extra input, and that input
will be given to the next transducer as its first input. Of course,
pushback is not
really a transducer, but it superficially looks like one when it’s in a
reality, it serves as a marker to tell the
chain to give that burned input to the next
transducer, which ends up giving us the desired result.
(sequence (chain (take-while #(< % 5)) pushback (filter odd?)) xs) ;=> (0 0 1 0 2 0 3 0 4 0 5 7 9)
It’s not exactly pretty but it was the best idea I could come up with to handle the situation.
The source for
chain and related bits are on github:
Transmuters. I’ll write up some more details when I
get the chance.