bits and pieces

October 18, 2010

Groovy (New feature): Closures can now memo(r)ize their results

Filed under: Uncategorized — Tags: — roshandawrani @ 12:38 pm

Groovy recently added a new feature to its closures – memoization – a simple yet very useful feature that can drastically improve the performance when applicable.

It was a shame that I didn’t know the meaning of this term that was coined way back in 1968 by Donald Mitchie (ref: http://en.wikipedia.org/wiki/Memoization) [But now that I have memoized it, next time I am asked what it is, I will reply faster! :-)]

So, all this new feature means is that a groovy closure can now remember the result of its invocation for a specific set of inputs – by internally caching the result. So, the first invocation for a given set of inputs does the actual work, then the result gets cached behind the scenes and any further invocations with the same inputs simply return the cached result, making the invocation as fast as it can be. Of-course, this feature should be used only when closure processing is such that a set of inputs will ALWAYS result in the same output – it MUST be side-effect-free.

Here is an example to illustrate the use. Let’s first see it without memoization:

cl = {a, b ->
    sleep(3000) // simulate some time consuming processing
    a + b
}

def callClosure(a, b) {
    def start = System.currentTimeMillis()

    cl(a, b)

    println "Inputs(a = $a, b = $b) - took ${System.currentTimeMillis() - start} msecs."
}

callClosure(1, 2)
callClosure(1, 2)
callClosure(2, 3)
callClosure(2, 3)
callClosure(3, 4)
callClosure(3, 4)

Here is the output showing that everytime the time cost of closure invocation in incurred:

Inputs(a = 1, b = 2) – took 3041 msecs.
Inputs(a = 1, b = 2) – took 3000 msecs.
Inputs(a = 2, b = 3) – took 3000 msecs.
Inputs(a = 2, b = 3) – took 3000 msecs.
Inputs(a = 3, b = 4) – took 3000 msecs.
Inputs(a = 3, b = 4) – took 3000 msecs.

Here is the memoized version:

cl = {a, b ->
    sleep(3000) // simulate some time consuming processing
    a + b
}
mem = cl.memoize()

def callClosure(a, b) {
    def start = System.currentTimeMillis()

    mem(a, b)

    println "Inputs(a = $a, b = $b) - took ${System.currentTimeMillis() - start} msecs."
}

callClosure(1, 2)
callClosure(1, 2)
callClosure(2, 3)
callClosure(2, 3)
callClosure(3, 4)
callClosure(3, 4)

callClosure(1, 2)
callClosure(2, 3)
callClosure(3, 4)

Here is the output showing how only the first invocation incurs the processing cost and after that the calls with the same inputs are almost free:

Inputs(a = 1, b = 2) – took 3054 msecs.
Inputs(a = 1, b = 2) – took 0 msecs.
Inputs(a = 2, b = 3) – took 3001 msecs.
Inputs(a = 2, b = 3) – took 0 msecs.
Inputs(a = 3, b = 4) – took 3000 msecs.
Inputs(a = 3, b = 4) – took 0 msecs.
Inputs(a = 1, b = 2) – took 0 msecs.
Inputs(a = 2, b = 3) – took 0 msecs.
Inputs(a = 3, b = 4) – took 0 msecs.

The trade-off involved is that you pay for time saving in terms of extra memory consumed by the internally cached inputs/outputs mappings. If you want to have more control over how many results you want to cache, other variants to memoize() are – memoizeAtMost(n), memoizeAtLeast(n), memoizeBetween(m, n)

To quickly see how memoizeAtMost(n) works, in the previous example, let’s replace mem = cl.memoize() by mem = cl.memoizeAtMost(2) – and the output changes as below (now only 2 results are cached, with the LRU ones evicted):

Inputs(a = 1, b = 2) – took 3051 msecs.
Inputs(a = 1, b = 2) – took 0 msecs. /* 1st result – cached! */
Inputs(a = 2, b = 3) – took 3000 msecs.
Inputs(a = 2, b = 3) – took 0 msecs. /* 2nd result – cached !! */
Inputs(a = 3, b = 4) – took 3001 msecs.
/* 3rd result – cached – but now (a = 1, b = 2) result evicted – on LRU basis */
Inputs(a = 3, b = 4) – took 0 msecs.
/* (a=1, b=2) got evicted, processing happens again and result cached, now evicting (a=2, b=3) result */
Inputs(a = 1, b = 2) – took 3000 msecs.
/* (a=2, b=3) got evicted, processing happens again and result cached, now evicting (a=3, b=4) result */
Inputs(a = 2, b = 3) – took 3000 msecs.
/* (a=3, b=4) got evicted, processing happens again */
Inputs(a = 3, b = 4) – took 3000 msecs.

Good feature, isn’t it?

Note: The new feature is only available from groovy version 1.8-beta-3 onwards.

About these ads

11 Comments »

  1. […] This post was mentioned on Twitter by Guillaume Laforge, marek krajewski, nobeans, トドル, Roshan and others. Roshan said: Groovy (New feature): Closures can now memo(r)ize their results – http://goo.gl/vnUz […]

    Pingback by Tweets that mention Groovy (New feature): Closures can now memo(r)ize their results « Bits that interest me -- Topsy.com — October 18, 2010 @ 1:24 pm

  2. Good feature! Is there any place where Groovy 1.8 API is published? Or it’s available only in trunk? I wonder how a cache key is created from closure arguments ..

    Comment by Evgeny Goldin — October 18, 2010 @ 2:03 pm

  3. excellent feature!

    Comment by steven — October 19, 2010 @ 10:11 am

  4. Looks like it behaves a bit odd for recursive call, or I am missing something..try this:

    def startTime = System.currentTimeMillis()

    def fact = {it -> sleep(100); it ? it * call(it – 1g) : 1g}
    fact = fact.memoize()
    println fact(70)
    println fact(71)

    println ((System.currentTimeMillis() – startTime) / 1000.0)

    Or it’s posted here: http://groovyconsole.appspot.com/script/439003

    Why doesn’t it memoize subsequent calls in this case? Because of call() in closure body isn’t routed via memoize() decorator?

    Comment by Mikhail Antonov — March 14, 2011 @ 12:34 am

    • I think the correct way of using memoize with a recursive call is not to go through its internal “call” function, but as below (http://groovyconsole.appspot.com/script/442001):
      ——————————
      ​def startTime = System.currentTimeMillis()
      def mem
      def fact = {it -> sleep(100); it ? it * mem(it – 1g) : 1g}
      mem = fact.memoize()
      println mem(60)
      println ((System.currentTimeMillis() – startTime) / 1000.0)

      startTime = System.currentTimeMillis()
      println mem(61)
      println ((System.currentTimeMillis() – startTime) / 1000.0)
      ——————————

      Now I get output like following:
      ——————————
      8320987112741390144276341183223364380754172606361245952449277696409600000000000000
      6.175 (time taken for fact(60))
      507580213877224798800856812176625227226004528988036003099405939480985600000000000000
      0.203 (time taken for fact(61) due to memoize savings)
      ——————————

      Comment by roshandawrani — March 14, 2011 @ 10:31 am

      • It’s strange that trying to call Closure.memoize() like mem(somestring) results in an exception for me, while using the call() method works (groovy 1.8.0 stable):
        Caught: groovy.lang.MissingMethodException: No signature of method: org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.doCall() is applicable for argument types: (java.lang.String) values: [ghh]
        Possible solutions: call(), call([Ljava.lang.Object;), call(java.lang.Object), call([Ljava.lang.Object;), equals(java.lang.Object), isCase(java.lang.Object)
        at Tghh.main(Tghh.groovy:89)

        Comment by avalon — June 20, 2011 @ 9:54 am

      • Please bring it up on Groovy User mailing list with the exact code you tried.

        Comment by roshandawrani — June 20, 2011 @ 11:29 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: