Tuesday, January 26, 2010

Clojure lazy sequence

I want to share my experience with lazy chunked sequences of clojure 1.1 and threads.

I wrote some code that used 'futures' to execute parallel IO, potentially in a large number of background threads (about 200).

I noticed that my code wasn't behaving like I expected and I was wondering if clojure executed the futures in some unbounded thread pool, or if it had some fixed maximum. So I wanted to try it out:

(map deref (map #(future-call (fn [] (Thread/sleep 1000) %)) (range 20))) 

this actually created 20 threads (I saw it with jconsole), and returned to the REPL within 1 second. So far so good. However:

(map deref (map #(future (Thread/sleep 1000) %) (range 200))) 

this took about 6 times more to execute. I also noticed a strange behaviour in thread creation. New threads were only created in ... chunks of 32....

Well now it seems obvious for me, but I didn't realize that I stuck upon the new clojure chunked lazy evaluation feature. The correct code is:

(map deref (doall (map #(future (Thread/sleep 1000) %) (range 200))))

Without the doall, only the first 32 futures are evaluated and actually submitted to the cachingThreadExecutor that sits behind the "future-call" core function.

In order to avoid this kind of errors in future, I created a simple helper:

(defn future-map [f seq]
  (doall (map #(future-call (fn [] (f %))) seq)))

to be used as:

 (future-map do-something asequence)

This sounds like a 'pmap' parallel map, but AFAIK the 'pmap' stuff was intended as a performance enhancement, and as such it tries to use a reasonable number of threads in order to exploit the available CPUs, a different kind of requirement from what I needed.

Friday, January 8, 2010

clarsec

I attempted to port Haskell monadic monadic parsing library Parsec to clojure

http://github.com/mmikulicic/clarsec


for now it's very basic but I already for work.