I wrote some code that used 'futures' to execute parallel IO, potentially in a large number of background threads (about 200).
I noticed that my code wasn't behaving like I expected and I was wondering if clojure executed the futures in some unbounded thread pool, or if it had some fixed maximum. So I wanted to try it out:
(map deref (map #(future-call (fn  (Thread/sleep 1000) %)) (range 20)))
this actually created 20 threads (I saw it with jconsole), and returned to the REPL within 1 second. So far so good. However:
(map deref (map #(future (Thread/sleep 1000) %) (range 200)))
this took about 6 times more to execute. I also noticed a strange behaviour in thread creation. New threads were only created in ... chunks of 32....
Well now it seems obvious for me, but I didn't realize that I stuck upon the new clojure chunked lazy evaluation feature. The correct code is:
(map deref (doall (map #(future (Thread/sleep 1000) %) (range 200))))
Without the doall, only the first 32 futures are evaluated and actually submitted to the cachingThreadExecutor that sits behind the "future-call" core function.
In order to avoid this kind of errors in future, I created a simple helper:
(defn future-map [f seq] (doall (map #(future-call (fn  (f %))) seq)))
to be used as:
(future-map do-something asequence)
This sounds like a 'pmap' parallel map, but AFAIK the 'pmap' stuff was intended as a performance enhancement, and as such it tries to use a reasonable number of threads in order to exploit the available CPUs, a different kind of requirement from what I needed.