My blog has been moved to ariya.ofilabs.com.

Wednesday, February 14, 2007

strawberry couverture

That's a fantastic dessert. Inspiration is from Google, I mean of course literally, the first page of google.com today :-)

(In case you miss it, check this out)

Monday, February 12, 2007

Modulus with Mersenne prime

Consider the following simple operation, where k is integer and p is prime:

  int i = k % p;

Typically this will be assembled to (sorry, x86 only):

  mov  eax, k
  cdq
  idiv p

where the result is available in register EDX.

Such IDIV instruction has a latency of more than 40 cycles on Intel Pentium or AMD64 processor family. Hence, for optimization purposes, it is best to avoid integer division.

The above division/modulus operation can be avoided if the prime number p is chosen to be the Mersenne primes only, i.e there is a positive integer s such as p = 2s-1. In 32-bit range, there are Mersenne primes: 3, 7, 31, 127, 8191, 131071, 524287, and 2147483647.

The modulus operator with Mersenne prime can be simplified as:

  int i = (k & p) + (k >> s);
  return (i >= p) ? i - p : i;

Update: This works only for k up to (1 << (2 * s)) - 1, so careful with small Mersenne primes. Otherwise you need to call the function recursively.

One possible assembler implementation is as follows.

  ; assume edx = k, ebx = p, ecx = s
  ; result is in eax
  mov     eax, edx
  sar     edx, cl        ; k >> s
  and     eax, ebx       ; k & p
  add     eax, edx       ; eax is i = (k & p) + (k >> s)
  mov     edx, eax       ; edx is also i
  sub     edx, ebx       ; i - p
  cmp     eax, ebx       ; only if (i >= p)
  cmovge  eax, edx       ; then eax is (i-p)

Note that with the help of CMOVGE (6 cycles latency on Pentium 4), there is no need for real branching (which is expensive). Although the code is longer compared to the IDIV version, it executes much faster. Still faster if the range k is limited so that only a decrement operator is needed. Even faster of course if p is constant.

Last time I used this is for hash table (micro)optimization, because the number of stored items is known and I can live with a table whose size is a Mersenne prime. That's indeed a very special case only.

BTW this is off-topic, but thanks for those who mailed/texted me after this post. Nothing happened to me, I'm just fine. Those lines are from Keane's latest single (guess which one?), picked for no particular reason other that it's a good ballad.

Tuesday, February 06, 2007

nightmarish

I wake up, it's a bad dream
No one on my side
I was fighting
But I just feel too tired
To be fighting
Guess I'm not the fighting kind

Saturday, February 03, 2007

Musselicious

In order to keep polluting the Planets (the aggregators, not the heavenly bodies), here is yet another pasta dish: Spaghetti with Mussels. The tomate sauce, unfortunately not really visible in the picture, goes along very well with the mussels, especially - if you are not in hurry - when cooked slowly. 1 kg of mussels should be enough for 2 portions, unless you successfully make it extremely delicious :-)

spaghetti with mussels