My blog has been moved to ariya.ofilabs.com.

Friday, November 09, 2007

Random number 1..5 to 1..7

The task is to solve this puzzle:

Given a function which produces a random day Kliwon, Legi, Paing, Pon, Wage, write a function which produces a random day Sunday to Saturday

OK, I modified the question a bit. Read about Anno Javanico if the names of the day sound strange to you. Originally, it says:

Given a function which produces a random integer in the range of 1 to 5, write a function which produces a random integer in the range of 1 to 7

This is well-known as one of the so called Microsoft/Google interview questions. There are million ways to solve it. Here is my take. Bear in my mind that my math skill is mediocre and I never studied computer science, so don't be surprised if the next few paragraphs look bogus and stupid.

Spoiler warning: If you have passion for puzzles, go away and solve it and then come back again later. Don't let this blog entry spoils the fun for you.

First, let's assume the random number 1..5 is generated by the following (the randomness, or lack thereof, of stdlib's rand() doesn't play any significant role here):

#include <stdlib.h>
int rand5()
{
 return 1 + rand() % 5;
}

Now we must write rand7() which gives an integer between 1 and 7 and which is allowed to call only the above rand5().

The trivial solution, which you should have in mind in a fraction of seconds, is:

int rand7()
{
  return rand5();
}

because nowhere it says that the original nor the new function must give a random number following a specific distribution, e.g. uniform distribution. Of course this can or can't be the real answer, depends on how you look at it.

The next logical step is assuming that the return value of rand7() must have a uniform distribution. The probability to get one of the number in the range of 1..7 is therefore approximately 0.143.

So what comes to mind is to reduce 1/5 probability from rand5() to 1/35, then increase it again to 5/35, which equals to 1/7. The former can be done by calling rand5() several times and treating the first result as 1..5, the second as 6..11 and so on until 31..35. The latter is easier, it's just a modulus operator. The code for this idea (which is shorter than the explanation above):

int rand7()
{
 static int c = 0;
 int x = rand5() + c;
 c = (c + 5) % 35;

 return (x % 7)+1;
}

(I know static variables can be evil, but that's another chapter...)

The disadvantage is obvious, the result is not completely random. For example, the first call will not yield 1 or 7 at all. Another variant is then by making c a bit random, though now that requires two calls to rand():

int rand7()
{
 static int c = 0;
 int x = rand5() + c;
 c = (c + 5*rand5()) % 35;

 return (x % 7)+1;
}

Another nice solution is by using rejection sampling, similar to the famous Box-Muller transform. Here we expand the range of 1...5 to 1..25 and reject anything larger than 7. The code is:

int rand7()
{
 int x = 8;
 while( x > 7)
  x = rand5() + 5*rand5() - 5;
 return x;
}

Pity that we throw away 8..25. It can be improved by reducing the rejection range to 22..25, IOW we would take anything in the range 1..21:

int rand7()
{
 int x = 22;
 while( x > 21)
  x = rand5() + 5*rand5() - 5;
 int r = 1 + (x % 7);
 return r;
}

(or, further by going to 1...125 and rejecting 120..125).

Modulus of 7 can be "optimized" by hand, this is because 7 is a Mersenne prime. For the details, see what I wrote before on modulus with Mersenne prime. This looks useless and even obfuscates the code, but it's harmless and you can tease your interviewer :-)

int rand7()
{
 int x = 25;
 while( x > 21)
  x = rand5() + 5*rand5() - 5;

 int r = (x >> 3) + (x & 7);
 r = (r >= 7) ? r - 6 : r + 1;
 return r;
}

Care to share your solutions?

47 comments:

Michael said...

I think one of the solutions you mentioned is similar but just to be explicit:

int rand7()
{
int i, sum;

for(i = 0, sum = 0; i < 5; ++i)
sum += rand5();

/* sum is in [5, 25] now, but that is not
* a uniform distribution, so subtract 2
* to put it in [3, 23] so that division by
* 3 gives us a nice even [1, 7].
*/

return (sum - 2) / 3;
}

Anonymous said...

If we want a uniform solution with constant time evaluation, we could try something like:

int rand7()
{
array[35]={/*magic numbers*/};
int i,sum;
for(i=7,sum=-7;i>0;--i) sum+=rand5();
return array[sum];
}

orzel said...

int rand7()
{
return 1+rand5();
}

is enough, no? Nobody said it has to be uniform.

Priyadi said...

the first that come to my mind:

int rand7() {
return 1+(rand5()+rand5()+rand5()+rand5()+rand5()+rand5()+rand5()-7)%7);
}

Anonymous said...

vii would say:

int rand7(){
return rand5() + (rand5()%3);
}

Alexei Sergeev said...

In my opintion only

int rand7(){
return rand5() + (rand5()%3);
}

is the right way. Others wrote crap.

Henrik said...

The simplest solution that produces numbers in the range 1..7 is probably:

int rand7()
{
return (rand5() + rand5()) % 7;
}

Not uniform distribution though.

Henrik said...

Sorry, that should have been return 1+...

chani said...

a lot of the comments are making a common probability mistake (assuming they think their results have a uniform distribution). when you roll dice several times and add the result, you do *not* get a uniform distribution.

for example, with rand5()+rand5() we have 1 way of getting 2 (1+1), 2 ways of getting 3 (1+2, 2+1), 3 ways of getting 4 (1+3, 2+2, 3+1), 4 ways of getting 5 (1+4, 2+3, 3+2, 4+1), 5 ways of getting 6 (1+5, 2+4, 3+3, 4+2, 5+1), 4 ways of getting 7 (2+5, 3+4, 4+3, 5+2) and so on. you end up with a 1/5 chance of getting a 6, and only a 1/25 chance of getting a 1.

isma said...

hey, here's maybe a bizarre way, but it gives an uniform distribution in 1..7

the idea is to use the rand5() function 3 times to obtain each of the bits that forms numbers from 1 to 7 (in ugly pseudo code):

case rand5()
< 3 : bit(i) = 1
> 3 : bit(i) = 0
= 3 : repeat

then we form the answer using the bits:
ans = bit(2)*4 + bit(1)*2 + bit(0)

of course if the result is 0 we repeat the process until it isn't.


In C it would become:

int ans = 0;
while (ans == 0) {
for (int i=0; i<3; i++) {
while ((r = rand5()) == 3){};
ans += (r < 3) * 2^i
}
}
return ans;

BTW, what's up with blogger? It seems impossible to post my comment using konqueror

Marco said...

I suppose rand5() produces a uniform distribution random number.

1) (rand5()-1)*5 + (rand5()-1)
produce a uniform distribution random number from 0 to 24
if I need a uniform distribution number from 0 to 20
so I exec 1) until i get a number between 0 and 20

after that i return the result %7

this sould be

int rand7(){
int t;
do{
t=(rand5()-1)*5 + (rand5()-1);
while(t<21);
return (t%7)+1;

}

what do you think? is this distribution uniform?

Christophe said...

none of these solutions give a uniform distribution ... however this one does :

static int x = rand5();

int rand7() {
x += rand5()-1;
x += rand5()-1;
x %= 7;
return x+1;
}

Test program :
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int rand5() {
return 1 + rand() % 5;
}

int x;

int rand7() {
x = (x + rand5() + rand5()) % 7;
return x+1;
}

int main(int argc, char ** argv) {
long i;

// init random
srand(time(NULL));

x = rand5();

int c[7];
for (i = 0; i < 7; i++) c[i] = 0;

for (i = 0; i < 1000000; i++) {
c[rand7()-1]++;
}

float sum = 0;
for (i = 0; i < 7; i++) {
printf("%% of %d : %f\n", i+1, c[i]/10000.0);
sum += c[i]/10000.0;
}
printf("total : %f %%\n\n", sum);
}

Christophe said...

@marco

Why not test it ? Here's the results for your routine :

oelewapperke@Wapper:~/tmp$ ./count.o
% of 1 : 24.999400
% of 2 : 25.080200
% of 3 : 25.016400
% of 4 : 24.904000
% of 5 : 0.000000
% of 6 : 0.000000
% of 7 : 0.000000
total : 100.000000 %

So no, it's not uniform. It always returns values between 1 and 5.

Anonymous said...

I'm not a programmer; the solution that first came to my mind is rand7 = (rand5-1)*6/4+1, assuming that rand5 follows the uniform distribution and the variables have infinite precision.

Marco said...

I just saw an error in my code
while(t<21);
it souhld be
while(t>20)

I could not test it I'm working.

to anonymous: I don't think RAND5 is infinite precision.... it could return 1,2,3,4,5 only these number

Anonymous said...

Wouldn't the simplest solution just be to ignore the input, and just spit out a random integer 1..7? Doesn't that meet the requirements?

Anonymous said...

This should have even distribution:

int rand_bit() {
int r;
r = rand5();
if(r != 3)
return r % 2;
else
return rand_bit();
}

int rand7() {
int r;
r = rand_bit() * 4 + rand_bit() * 2 + rand_bit();
if(r != 0)
return r;
else
return rand7();
}

Simon said...

Christophe: That doesn't seem to be uniform distribution to me either.

Here are the combinations:
0: 0+0
1: 0+1, 1+0
2: 0+2, 2+0, 1+1
3: 0+3, 3+0, 1+2, 2+1
4: 0+4, 4+0, 1+3, 3+1, 2+2
5: 1+4, 4+1, 2+3, 3+2
6: 2+3, 3+2, 3+3
7: 3+4, 4+3
8: 4+4

So if you start off with this non-uniform distribution, all your calculations later still produce non-uniform probability.

Here's my idea to tackle this problem:

int table[5][5] = {
{1, 1, 1, 2, 2},
{2, 3, 3, 3, 4},
{4, 4, 5, 5, 5},
{6, 6, 6, 7, 7},
{7, 0, 0, 0, 0}
}

Using rand5()-1 twice to produce index i and j to this table. If the returned result is 0, repeat until a result of a non-zero returned.

The table can also be placed with the above numbers in completely random order as well, although in theory that doesn't make it more random.

technic said...

Here's my take:
int r,x;
do{
do{
r = rand5();
if (r > 2 && r <= 4) x = rand5() + 5;
else if (r <= 2) x = rand5();
}while (r > 4);
}while (x > 7);
return x;
Result after 350000 tries is:
1 -> 50170
2 -> 50174
3 -> 50201
4 -> 49921
5 -> 50178
6 -> 49621
7 -> 49735

technic said...

Here's another take from me, in matlab though. Read the explanation below the codes. The idea is to generate floating point rand() generator from 0..1, precision is not infinite though.
function r = rand7()
temp = 0;
n = 1000;
sig = sqrt(2);
mu = 3;
for i = 1:n
temp = temp + rand5;
end
zn = (temp - n * mu)/(sig * sqrt(n));
x=0.5*erfc(-zn/sqrt(2));
r = floor(mod(x*49,7))+1;

By virtue of central limit theorem, we know that summation of n random variables (same distribution, mu, and sigma) will generate normal distribution if n is large enough. So here I put 1000 for it(I know it sh*cks). Then, we have to convert it to standard normal distribution (mu = 0, sigma = 1). Next, we have to convert it to uniform distribution by virtue of normdist(z). Cheating here by using erfc() function from matlab. By right, we now have "almost continous, yet discrete(depending on n)" uniform R.V. from 0 to 1. The next is trivial. Result after 100000 trial:
1-> 15524
2-> 11955
3-> 15374
4-> 14206
5-> 15342
6-> 12003
7-> 15596
I know they're not uniform, I guess something's wrong with the CPU(hehe, 90 % problem is in the code probably)

Nazca said...

Okie... not actually tested this, but I think it works.

int rand7()
{
int c = 15;
while (c > 13)
c = (((rand5()-1) << 2) + (rand5()-1));
return c%7+1;
}

technic said...

More explanation on my answers:
http://technical-journal.blogspot.com/2007/11/random-number-challenge.html

Derrick J. Wippler said...
This comment has been removed by the author.
Derrick J. Wippler said...

Simple solutions FTW

int rand7() {
return 7 / rand5();
}

You will only get 1 2 3 or 7 using this solution, but as stated earlier uniformity wasn't requested.

More uniform would be

int rand7() {
return (( rand5() + rand5() ) % 7) + 1;
}

Andrey said...

x = rand5();
y = round(x * 7 / 5);

as we know from math. the distribution of function of random value is equal to function of distribution of random value.

cinetrends said...

0111 is binary equivalent of 7.

so we need to get random numbers for the three bits to get numbers between 0 and 7
e.g. 000, 001, 010, etc until 111

so use rand(5) to determine if u want to set a bit or not...
if number generated is less than 3, for e.g., u can set the bit. else not set the bit...

thus u can generate random number between 0 and 7 by setting each of the three bits independently.
think about it! d'u really need the rand(5) method for this, or is it just to distract from this solution! :-)
-shiv

keyvez said...

I was trying to solve this problem by essentially doing

(rand5()+rand5()+rand5()) % 7

Here's the full explanation on my method (with python code):

http://www.keyvez.net/2008/05/simulate-7-die-with-5-die-one.html

Ariya Hidayat said...

@keyvez: you have a serious bug in your test program, as (rand5()+rand5()+rand5()) % 7 theoretically will not give you a uniform distribution.

keyvez said...

I am also suspicious of my method, could you please elaborate or point out a hole. I've put details at :

http://www.keyvez.net/2008/05/simulate-7-die-with-5-die-one.html

Simon said...

@keyvez: You didn't take into account of all 125 (5*5*5) permutations. Here's the distribution of your approach:
1: 18 times
2: 19 times
3: 19 times
4: 19 times
5: 18 times
6: 16 times
7: 16 times

keyvez said...

I didn't consider permutations because they are combinations of the three outcomes. Combinations with repetition to be more precise.

Because I am adding the value of the three outcomes in each experiment, so 1+2+3 is the same as 3+2+1

I have detailed the full explanation at http://www.keyvez.net

Antonio said...
This comment has been removed by the author.
rtybase said...

What about:

array = new array[7];

Step1: for (i=0..array.size-1) array[i] = rand5();

Step2: find max = max{array[i]};

Step3: build array1 = {array[i] | array[i] = max}, so "array1" is a subset of "array"

Step4: IF array1.size == 1 THEN return its element ELSE array = array1 and repeat Step 1
...
Repeat until exactly 1 element is in "array1".

Distribution seems to be quite uniform.

Alex said...
This comment has been removed by the author.
Anonymous said...

The simplest and more effective way to generate random number with required distribution is tossing coin experiment. As you all know, in probability theory tossing coin experiment is used to generate independent events. The following algo perfectly maintains the probability distribution. rand5() is only distraction to the programmer, so to generate one need ability to toss coin, i.e. any random function. There is no limit range one can generate, it will be effective if the random number generate is between 0-2**N.

int val =0, c,j,shi=0;
for(j=0;j<3;j++)
{
ran = rand5();
shi = (ran>2)?1:0;
shi = shi<< j;
val = val | shi;
}

mighty-whity said...
This comment has been removed by the author.
mighty-whity said...

Hmm, many answers are not quite right.

If you solve system:
1*x = 1+y,
5*x = 7 + y.

x=1.5, y=0.5

So, our function is

int random7()
{
return(1.5*random5() - 0.5);
}

George said...

Hi there :)

I don't think a perfect solution is possible without throwing out some answers. Thus in terms of code an IF clause of some sort is going to be required.

One neat and almost perfect solution I've come up with in addition to what others have listed is as follows:

1. Run the rand5 function six times.
2. Assuming you get six random digits 0-4 from step one, you will now have a completely random six-digit number in base-5.
3. A six-digit number in base-5 has a maximum value of 15,624. That number is divisible by 7, so you can simply divide it into even shelves and have a perfectly random result.

* The trouble with this method is, of course, that 000000 = 0 is a valid number, so there are in fact 15,625 numbers, not 15,624. Actually pretty obvious considering only a power of 5 would make sense. We can simply replace some of your above code using an IF statement and discarding 000000 ... that would mean that we'd get a valid answer 15,624 / 15,625 times on the first try.

Anonymous said...

I think one of your solution has a problem.

int rand7()
{
static int c = 0;
int x = rand5() + c;
c = (c + 5*rand5()) % 35;

return (x % 7)+1;
}

This solution can return uniform distribution. However variable x has memory property because of variable c. This means the previous result effect to the next result. If we know current return value we can guess a value can be returned with higher probability than others'.

(And.. sorry for my poor Engligh...)

Ariya Hidayat said...

@Anonymous: The problem is not unknown, see why I wrote "static". Think of each step as a progressive improvement, following the train of thoughs, towards reaching the ultimate answer.

Mike Kobit said...

@Alexei Sergeev
That is wrong if you want uniform distribution.

Shriram said...

The first solution

rand7()
{
rand5()
}
will not gives values ranging from 6, and 7.
even though it didn't say it requires uniform distribution.
when it expects a number from the range 1-7,
it requires for the function to generate atleast once or 1/7 probability of generating numbers 6 and 7.
whereas rand7() has zero probability for generating numbers 6 and 7

Ariya Hidayat said...

@Shriram: The first solution rand7() "has zero probability for generating numbers 6 and 7" which is perfectly fine if uniform distribution is not mandated. not uniform = no obligation to generate 6 or 7 or any others (regardless the range).

Anonymous said...

Ariya thanks for this blog entry.

One question, why do you choose to run rand twice? Can you not just run rand5 twice and throw out any invalid values?

For example, assuming rand5 return 0 to 4, then

int rand7()
{
int result = rand5 + rand5;
if (result <= 6)
return result + 1;
else
return rand7();
}

Wouldn't this return you back a uniformly distributed value between 1 and 7?

Ariya Hidayat said...

@Anonymous: Adding two random numbers does not produce uniform distribution, check other comments.

Chen said...

(rand5()+rand5()+rand5()+rand5()+rand5()+rand5()+rand5())%7
Which should be uniform

Ariya Hidayat said...

@Chen: no, it's not.