Groovy (New feature): Closures can now memo(r)ize their results

Groovy recently added a new feature to its closures – memoization – a simple yet very useful feature that can drastically improve the performance when applicable.

It was a shame that I didn’t know the meaning of this term that was coined way back in 1968 by Donald Mitchie (ref: http://en.wikipedia.org/wiki/Memoization) [But now that I have memoized it, next time I am asked what it is, I will reply faster! :-)]

So, all this new feature means is that a groovy closure can now remember the result of its invocation for a specific set of inputs – by internally caching the result. So, the first invocation for a given set of inputs does the actual work, then the result gets cached behind the scenes and any further invocations with the same inputs simply return the cached result, making the invocation as fast as it can be. Of-course, this feature should be used only when closure processing is such that a set of inputs will ALWAYS result in the same output – it MUST be side-effect-free.

Here is an example to illustrate the use. Let’s first see it without memoization:

cl = {a, b ->
    sleep(3000) // simulate some time consuming processing
    a + b
}

def callClosure(a, b) {
    def start = System.currentTimeMillis()

    cl(a, b)

    println "Inputs(a = $a, b = $b) - took ${System.currentTimeMillis() - start} msecs."
}

callClosure(1, 2)
callClosure(1, 2)
callClosure(2, 3)
callClosure(2, 3)
callClosure(3, 4)
callClosure(3, 4)

Here is the output showing that everytime the time cost of closure invocation in incurred:

Inputs(a = 1, b = 2) – took 3041 msecs.
Inputs(a = 1, b = 2) – took 3000 msecs.
Inputs(a = 2, b = 3) – took 3000 msecs.
Inputs(a = 2, b = 3) – took 3000 msecs.
Inputs(a = 3, b = 4) – took 3000 msecs.
Inputs(a = 3, b = 4) – took 3000 msecs.

Here is the memoized version:

cl = {a, b ->
    sleep(3000) // simulate some time consuming processing
    a + b
}
mem = cl.memoize()

def callClosure(a, b) {
    def start = System.currentTimeMillis()

    mem(a, b)

    println "Inputs(a = $a, b = $b) - took ${System.currentTimeMillis() - start} msecs."
}

callClosure(1, 2)
callClosure(1, 2)
callClosure(2, 3)
callClosure(2, 3)
callClosure(3, 4)
callClosure(3, 4)

callClosure(1, 2)
callClosure(2, 3)
callClosure(3, 4)

Here is the output showing how only the first invocation incurs the processing cost and after that the calls with the same inputs are almost free:

Inputs(a = 1, b = 2) – took 3054 msecs.
Inputs(a = 1, b = 2) – took 0 msecs.
Inputs(a = 2, b = 3) – took 3001 msecs.
Inputs(a = 2, b = 3) – took 0 msecs.
Inputs(a = 3, b = 4) – took 3000 msecs.
Inputs(a = 3, b = 4) – took 0 msecs.
Inputs(a = 1, b = 2) – took 0 msecs.
Inputs(a = 2, b = 3) – took 0 msecs.
Inputs(a = 3, b = 4) – took 0 msecs.

The trade-off involved is that you pay for time saving in terms of extra memory consumed by the internally cached inputs/outputs mappings. If you want to have more control over how many results you want to cache, other variants to memoize() are – memoizeAtMost(n), memoizeAtLeast(n), memoizeBetween(m, n)

To quickly see how memoizeAtMost(n) works, in the previous example, let’s replace mem = cl.memoize() by mem = cl.memoizeAtMost(2) – and the output changes as below (now only 2 results are cached, with the LRU ones evicted):

Inputs(a = 1, b = 2) – took 3051 msecs.
Inputs(a = 1, b = 2) – took 0 msecs. /* 1st result – cached! */
Inputs(a = 2, b = 3) – took 3000 msecs.
Inputs(a = 2, b = 3) – took 0 msecs. /* 2nd result – cached !! */
Inputs(a = 3, b = 4) – took 3001 msecs.
/* 3rd result – cached – but now (a = 1, b = 2) result evicted – on LRU basis */
Inputs(a = 3, b = 4) – took 0 msecs.
/* (a=1, b=2) got evicted, processing happens again and result cached, now evicting (a=2, b=3) result */
Inputs(a = 1, b = 2) – took 3000 msecs.
/* (a=2, b=3) got evicted, processing happens again and result cached, now evicting (a=3, b=4) result */
Inputs(a = 2, b = 3) – took 3000 msecs.
/* (a=3, b=4) got evicted, processing happens again */
Inputs(a = 3, b = 4) – took 3000 msecs.

Good feature, isn’t it?

Note: The new feature is only available from groovy version 1.8-beta-3 onwards.