My blog has been moved to ariya.ofilabs.com.

## Monday, February 12, 2007

### Modulus with Mersenne prime

Consider the following simple operation, where k is integer and p is prime:

```  int i = k % p;
```

Typically this will be assembled to (sorry, x86 only):

```  mov  eax, k
cdq
idiv p
```

where the result is available in register EDX.

Such IDIV instruction has a latency of more than 40 cycles on Intel Pentium or AMD64 processor family. Hence, for optimization purposes, it is best to avoid integer division.

The above division/modulus operation can be avoided if the prime number p is chosen to be the Mersenne primes only, i.e there is a positive integer s such as p = 2s-1. In 32-bit range, there are Mersenne primes: 3, 7, 31, 127, 8191, 131071, 524287, and 2147483647.

The modulus operator with Mersenne prime can be simplified as:

```  int i = (k & p) + (k >> s);
return (i >= p) ? i - p : i;
```

Update: This works only for k up to (1 << (2 * s)) - 1, so careful with small Mersenne primes. Otherwise you need to call the function recursively.

One possible assembler implementation is as follows.

```  ; assume edx = k, ebx = p, ecx = s
; result is in eax
mov     eax, edx
sar     edx, cl        ; k >> s
and     eax, ebx       ; k & p
add     eax, edx       ; eax is i = (k & p) + (k >> s)
mov     edx, eax       ; edx is also i
sub     edx, ebx       ; i - p
cmp     eax, ebx       ; only if (i >= p)
cmovge  eax, edx       ; then eax is (i-p)
```

Note that with the help of CMOVGE (6 cycles latency on Pentium 4), there is no need for real branching (which is expensive). Although the code is longer compared to the IDIV version, it executes much faster. Still faster if the range k is limited so that only a decrement operator is needed. Even faster of course if p is constant.

Last time I used this is for hash table (micro)optimization, because the number of stored items is known and I can live with a table whose size is a Mersenne prime. That's indeed a very special case only.

BTW this is off-topic, but thanks for those who mailed/texted me after this post. Nothing happened to me, I'm just fine. Those lines are from Keane's latest single (guess which one?), picked for no particular reason other that it's a good ballad.

Anonymous said...

Pretty neat.

Anonymous said...

Don't constain your hash-tables just accept floating point conversions:

float d = k / p;
return (k - ((int)d*p));

about 6 assember instructions with 8 clocks latency.

Anonymous said...

if p is constant, let the compiler optimise the division/modulus for you. Including the case where p is a Mersenne prime.

Anonymous said...

I dont think the trick works.
example if I want k%7 and k=500